From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH RFC 00/18] xfs: sparse inode chunks
Date: Sat, 26 Jul 2014 10:03:35 +1000 [thread overview]
Message-ID: <20140726000335.GE20518@dastard> (raw)
In-Reply-To: <20140725163056.GA3350@laptop.bfoster>
On Fri, Jul 25, 2014 at 12:30:57PM -0400, Brian Foster wrote:
> On Fri, Jul 25, 2014 at 08:32:11AM +1000, Dave Chinner wrote:
> > On Thu, Jul 24, 2014 at 10:22:50AM -0400, Brian Foster wrote:
> > > Hi all,
> > >
> > > This is a first pass at sparse inode chunk support for XFS. Some
> > > background on this work is available here:
> > >
> > > http://oss.sgi.com/archives/xfs/2013-08/msg00346.html
> > >
> > > The basic idea is to allow the partial allocation of inode chunks into
> > > fragmented regions of free space. This is accomplished through addition
> > > of a holemask field into the inobt record that defines what portion(s)
> > > of an inode chunk are invalid (i.e., holes in the chunk). This work is
> > > not quite complete, but is at a point where I'd like to start getting
> > > feedback on the design and what direction to take for some of the known
> > > gaps.
> > >
> > > The basic breakdown of functionality in this set is as follows:
> > >
> > > - Patches 1-2 - A couple generic cleanups that are dependencies for later
> > > patches in the series.
> > > - Patches 3-5 - Basic data structure update, feature bit and minor
> > > helper introduction.
> > > - Patches 6-7 - Update v5 icreate logging and recovery to handle sparse
> > > inode records.
> > > - Patches 8-13 - Allocation support for sparse inode records. Physical
> > > chunk allocation and individual inode allocation.
> > > - Patches 14-16 - Deallocation support for sparse inode chunks. Physical
> > > chunk deallocation, individual inode free and cluster free.
> > > - Patch 17 - Fixes for bulkstat/inumbers.
> > > - Patch 18 - Activate support for sparse chunk allocation and
> > > processing.
> > >
> > > This work is lightly tested for regression (some xfstests failures due
> > > to repair) and basic functionality. I have a new xfstests test I'll
> > > forward along for demonstration purposes.
> > >
> > > Some notes on gaps in the design:
> > >
> > > - Sparse inode chunk allocation granularity:
> > >
> > > The current minimum sparse chunk allocation granularity is the cluster
> > > size.
> >
> > Looking at the patchset (I got to patch 5 that first uses this),
> > this is problematic. the cluster size is currently a kernel
> > implementation detail, and not something defined by the on-disk
> > format. We can change the cluster size in the kernel and not affect
> > the format on disk. Making the cluster size a part of the disk
> > format by defining it to be the resolution of sparse inode chunks
> > changes that - it's now a part of the on-disk inode format, and that
> > greatly limits what we can do with it.
> >
>
> I was going off the inoalignmt bit that's set at mkfs time, but looking
> at the code I think I see what you mean...
>
> > > My initial attempts at this work tried to redefine to the minimum
> > > chunk length based on the holemask granularity (a la the stale macro I
> > > seemingly left in this series ;), but this involves tweaking the
> > > codepaths that use the cluster size (i.e., imap) which proved rather
> > > hairy.
> >
> > This is where we need to head towards, though. The cluster size is
> > currently the unit of inode IO, so that needs to be influenced by the
> > sparse inode chunk granularity. Yes, we can define the inode chunk
> > granularity to be the same as the cluster size, but that simply
> > means we need to configure the cluster size appropriately at mount.
> > It doesn't mean we need to change what cluster size means or it's
> > implementation....
> >
>
> I don't think I've necessarily encoded the cluster size into the disk
> format explicitly, but rather used it as a heuristic for how to size the
> sparse inode chunk allocations. I think what you're saying is that the
> cluster size can change across a mount or code update, while the on-disk
> allocation state will not. Thus we can't always go on the assumption
> that the on-disk allocations will be sane with regard to the current
> cluster size. For example, I suppose if the cluster size increased after
> we have some existing sparse allocations we'd end up with some broken
> buffer mappings for inode chunks.
Yup, that's the problem in a nutshell: cluster size is not fixed.
> > Again, that's kernel inode buffer cache implementaiton details, not
> > something that matters for the on-disk format. So really these need
> > to be separated. Probably means we need a "sparse inode allocation
> > alignment" field in the superblock to define this. Having
> > the kernel reject sparse alignments it can't support from the
> > initial implementation means we can improve the kernel
> > implementation over time and (eventually) support sub-cluster sized
> > sparse inode allocation.
> >
> > i.e. initial implementation only supports sparse alignment ==
> > cluster size, and rejects everything else....
> >
>
> Yeah, that's pretty much what I was trying to accomplish, just not
> encoded properly with regard to the superblock options. E.g., I
> overloaded the cluster size in a fragile way. Instead, explicitly set
> the sparse inode allocation granularity and let that influence the
> cluster size. I suspect that means we could even still use the cluster
> size heuristic to define the default alloc. granularity, but of course
> we aren't married to it so can evaluate that independently. That sounds
> much nicer, thanks.
Right. We may not do anything else in the short or medium term, but
having a separate superblock field gives us the flexibility for
change the inode cluster implementation/behaviour in the future...
> > dynamically, because it means the same upgrade/downdgrade cycle can
> > have different results simply due to filesystem freespace
> > fragmentation patterns...
> >
> > Perhaps an xfs-admin command to turn the feature on dynamicallyi for
> > existing filesystems, kind of like what we did with lazy superblock
> > counters when they were introduced?
> >
>
> Hmm, I suppose that does create a new and interesting dynamic with
> regard to the feature bit (non-deterministic backwards compatibility).
> One could certainly value backwards compatibility over this particular
> feature, and there is currently no way to control it. I'll look into
> doing something with xfs_admin. In fact, I was thinking of adding
> something to tune the cluster size bit to get around the v5 scaling
> issue anyways.
What v5 scalability issue is that? I don't recall any outstanding
issues with inode cluster IO....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-07-26 0:03 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-24 14:22 [PATCH RFC 00/18] xfs: sparse inode chunks Brian Foster
2014-07-24 14:22 ` [PATCH 01/18] xfs: refactor xfs_inobt_insert() to eliminate loop and support variable count Brian Foster
2014-07-24 22:10 ` Dave Chinner
2014-07-28 16:03 ` Brian Foster
2014-07-28 23:32 ` Dave Chinner
2014-07-29 14:43 ` Brian Foster
2014-07-24 14:22 ` [PATCH 02/18] xfs: pass xfs_mount directly to xfs_ialloc_cluster_alignment() Brian Foster
2014-07-24 14:22 ` [PATCH 03/18] xfs: define sparse inode chunks v5 sb feature bit and helper function Brian Foster
2014-07-24 17:08 ` Mark Tinguely
2014-07-24 17:37 ` Brian Foster
2014-07-24 18:38 ` Mark Tinguely
2014-07-24 19:38 ` Brian Foster
2014-07-24 23:35 ` Dave Chinner
2014-07-24 14:22 ` [PATCH 04/18] xfs: introduce inode record hole mask for sparse inode chunks Brian Foster
2014-07-24 22:14 ` Dave Chinner
2014-07-28 16:16 ` Brian Foster
2014-08-07 15:18 ` Brian Foster
2014-07-24 14:22 ` [PATCH 05/18] xfs: create macros/helpers for dealing with " Brian Foster
2014-07-24 22:13 ` Dave Chinner
2014-07-24 14:22 ` [PATCH 06/18] xfs: pass inode count through ordered icreate log item Brian Foster
2014-07-24 14:22 ` [PATCH 07/18] xfs: handle sparse inode chunks in icreate log recovery Brian Foster
2014-07-24 14:22 ` [PATCH 08/18] xfs: create helper to manage record overlap for sparse inode chunks Brian Foster
2014-07-24 22:41 ` Dave Chinner
2014-07-28 16:19 ` Brian Foster
2014-07-29 0:07 ` Dave Chinner
2014-07-29 15:10 ` Brian Foster
2014-07-24 14:22 ` [PATCH 09/18] xfs: allocate sparse inode chunks on full chunk allocation failure Brian Foster
2014-07-24 14:23 ` [PATCH 10/18] xfs: set sparse inodes feature bit when a sparse chunk is allocated Brian Foster
2014-07-24 22:46 ` Dave Chinner
2014-07-28 16:23 ` Brian Foster
2014-07-24 14:23 ` [PATCH 11/18] xfs: reduce min. inode allocation space requirement for sparse inode chunks Brian Foster
2014-07-24 22:50 ` Dave Chinner
2014-07-24 14:23 ` [PATCH 12/18] xfs: helper to convert inobt record holemask to inode alloc. bitmap Brian Foster
2014-07-24 23:21 ` Dave Chinner
2014-07-24 14:23 ` [PATCH 13/18] xfs: filter out sparse regions from individual inode allocation Brian Foster
2014-07-24 14:23 ` [PATCH 14/18] xfs: update free inode record logic to support sparse inode records Brian Foster
2014-07-24 14:23 ` [PATCH 15/18] xfs: only free allocated regions of inode chunks Brian Foster
2014-07-24 23:24 ` Dave Chinner
2014-07-24 14:23 ` [PATCH 16/18] xfs: skip unallocated regions of inode chunks in xfs_ifree_cluster() Brian Foster
2014-07-24 14:23 ` [PATCH 17/18] xfs: use actual inode count for sparse records in bulkstat/inumbers Brian Foster
2014-07-24 23:29 ` Dave Chinner
2014-07-24 14:23 ` [PATCH 18/18] xfs: enable sparse inode chunks for v5 superblocks Brian Foster
2014-07-24 23:34 ` Dave Chinner
2014-07-24 16:28 ` [PATCH RFC 00/18] xfs: sparse inode chunks Brian Foster
2014-07-24 22:32 ` Dave Chinner
2014-07-25 16:30 ` Brian Foster
2014-07-26 0:03 ` Dave Chinner [this message]
2014-07-28 12:14 ` Brian Foster
2014-07-29 0:26 ` Dave Chinner
2014-07-29 15:25 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140726000335.GE20518@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox