public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH v3 04/18] xfs: introduce inode record hole mask for sparse inode chunks
Date: Tue, 10 Feb 2015 07:58:37 -0500	[thread overview]
Message-ID: <20150210125837.GA27661@laptop.bfoster> (raw)
In-Reply-To: <20150209214815.GV12722@dastard>

On Tue, Feb 10, 2015 at 08:48:15AM +1100, Dave Chinner wrote:
> On Mon, Feb 09, 2015 at 10:15:02AM -0500, Brian Foster wrote:
> > On Mon, Feb 09, 2015 at 08:57:10AM +1100, Dave Chinner wrote:
> > > On Sun, Feb 08, 2015 at 11:06:01AM -0500, Brian Foster wrote:
> > > > > > @@ -82,7 +84,9 @@ xfs_inobt_update(
> > > > > >  	union xfs_btree_rec	rec;
> > > > > >  
> > > > > >  	rec.inobt.ir_startino = cpu_to_be32(irec->ir_startino);
> > > > > > -	rec.inobt.ir_freecount = cpu_to_be32(irec->ir_freecount);
> > > > > > +	rec.inobt.ir_holemask = cpu_to_be16(irec->ir_holemask);
> > > > > > +	rec.inobt.ir_count = irec->ir_count;
> > > > > > +	rec.inobt.ir_freecount = irec->ir_freecount;
> > > > > >  	rec.inobt.ir_free = cpu_to_be64(irec->ir_free);
> > > > > >  	return xfs_btree_update(cur, &rec);
> > > > > >  }
> > > > > 
> > > > > Hmmm - perhaps a similar set of helpers for sparse inode enabled
> > > > > filesystems
> > > > > 
> > > > 
> > > > We could do that for the insert/update helpers, but note again the
> > > > separate helpers would not split along the lines of sparse vs full
> > > > chunks. Even if we were to change the design such that they do, that
> > > > would have th effect of complicating some of the subsequent code. For
> > > 
> > > I suspect it might, but I really don't like the idea of writing
> > > fields that don't actually exist (and hence are always zero) in
> > > the on-disk format to disk. See, for example, the v5 superblock
> > > field writes are conditional in xfs_sb_to_disk, the v3 inode field
> > > writes are conditional in xfs_dinode_to_disk, etc.
> > > 
> > 
> > Indeed. The point here is just that a v1/v2 differentiation might be
> > more relevant than full vs. sparse. As it is, a sparse inode enabled fs
> > uses all of the fields (e.g., ir_count == 64 on a full chunk). So this
> > is more about providing clean backwards compatibility than deciding
> > between two different "allocation types," so to speak.
> > 
> > Beyond that, I'm more just probing for a clean use of the new on-disk
> > format distinction. The references above actually use the associated
> > feature checks in a single helper as opposed to a separate one, and that
> > seems a bit more clean an approach to me.
> 
> BUt that's because those object have enough information internally
> to determine what the correct action to take is. An inobt record
> doesn't have the internal information to say whay format it is
> supposed to be in. Hence we need to enforce the use of the correct
> marshalling logic in some way...
> 

Sure, some structures have version information that is used to translate
the in-core structure appropriately, but I don't see the existence or
lack of such information as a trigger for using one pattern or another
to do so. xfs_sb_to_disk(), xfs_bmdr_to_bmbt(), are a couple examples
that use the same kind of pattern based on a feature bit.

> > > > example, the merge/update code currently has no reason to care about
> > > > whether it has created a full chunk out of multiple sparse chunks. This
> > > > would require more code to make that distinction simply to pick the
> > > > correct api, for something that can easily be done with a simple generic
> > > > helper. The same goes for things like the finobt, where now
> > > > lookup/update/insert has to make distinctions depending on the type of
> > > > inode record.
> > > 
> > > That was another thing that wasn't clear to me - does the finobt
> > > record format change on disk? I don't think it does, as it only
> > > holds a free count and a free inode bitmask, so it doesn't care
> > > which bits of the chunk are allocated or not....
> > > 
> > 
> > No, it shouldn't change the format. But if we have a separate set of
> > inobt helpers, it has to use one or the other depending on the type of
> > record holding the inode that is freed or allocated.
> 
> Ok, so we now have 3 instances of inode btrees that use the current
> format, and one that uses the new sparse record format. (i.e.
> existing inobt, existing finobt and sparse inode enable finobt vs
> sparse inode inobt). That really, to me, says we need clear
> demarcation....
> 
> > > > An alternate approach could be to stick with the generic helpers, but
> > > > try and tweak them to use the appropriate on-disk format depending on
> > > > the record. Then again, how can one really identify one from the other
> > > > in the lookup or _get_rec() scenarios?
> > > 
> > > Solving these problem was exactly why I suggested different helpers
> > > - the btree ops structure that is chosen for the cursor is in a
> > > place where we know what format we are using, and that avoids
> > > needing "if (xfs_sb_version...)" checks in the helpers to determine
> > > what format we need to use.
> > > 
> > 
> > Hmm, I'm not sure I follow the train of thought here. The examples above
> > use exactly that kind of logic within the helper. Further, some of the
> 
> They use internal object state (i.e. inode version number,
> superblock flag field in sb being translated) to make that decision.
> the inobt changes use external structure state to do the
> conversion...
> 
> > > > Before I go one way or another here with regard to the on-disk data
> > > > structure, care to take a look at the subsequent patch(es) that use
> > > > these helpers? Patch 10 in particular pretty much sums up how these
> > > > helpers are used for sparse inode record management.
> > > 
> > > Yup, I'll get to it.
> > > 
> > 
> > Thanks. I've made a quick pass and most of it makes sense save some of
> > the helper business we're discussing here. I can start into some of the
> > higher level refactoring, try to incorporate the on-disk record format
> > union as clean as I can and we can move forward using the code of the
> > next version, if that's easier. ;)
> 
> Yup, that's fine - refactoring the code might make a difference ;)
> 
> > > > FWIW, comments on the generic bitmap stuff is also appreciated as that's
> > > > another area I'm not totally convinved is done right. :)
> > > 
> > > Why use the generic bitmap stuff rather than the existing XFS code?
> > > If that's just an in-memory change to the free mask processing, then
> > > it can be done separately to the sparse inode functionality, and
> > > probably should be done first in the series....
> > > 
> > 
> > http://oss.sgi.com/archives/xfs/2014-07/msg00380.html
> > 
> > Last comment in that mail. ;) To be honest, I liked the original code
> > better because it didn't require the bitmap_to_u64() helper hack and
> 
> I didn't realise there were arch-specific warts in the generic
> bitmap code.
> 
> > ports directly to userspace (so far I've managed to get away with
> > dropping that stuff entirely in userspace). Alternatively, I've done
> > significantly more testing with the latest code than the initial version
> > so I'd rather not change it back unless a real need arises.
> 
> Well, either way I'd like to avoid nasty conversions if it can be
> avoided. Perhaps look at the in-memory bitmaps being held in LE
> format (i.e. __le64) and using the primitives in
> include/asm-generic/bitops/le.h to manipulate them (i.e
> find_next_zero_bit_le() and friends)? Maybe that would avoid all the
> problems of architectural layout of the bitmap fields and hence make
> conversion simple (i.e. just a le64_to_cpu() call?)
> 

Hmm, interesting. On a quick pass, it doesn't appear to provide some of
the additional functions (e.g., weight, and) beyond set/clear/test/find.
I suspect that means we'd end up open coding such things, possibly
introducing more cpu<->le conversions to go back and forth between
native bitwise ops and the ops provided by the utility functions.

I'll see how it looks with the changes suggested so far and we can
consider whether it's worth converting back from there...

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-02-10 12:58 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-06 19:52 [PATCH v3 00/18] xfs: sparse inode chunks Brian Foster
2015-02-06 19:52 ` [PATCH v3 01/18] xfs: add sparse inode chunk alignment superblock field Brian Foster
2015-02-06 22:40   ` Dave Chinner
2015-02-08 16:04     ` Brian Foster
2015-02-08 21:43       ` Dave Chinner
2015-02-06 19:52 ` [PATCH v3 02/18] xfs: use sparse chunk alignment for min. inode allocation requirement Brian Foster
2015-02-06 19:52 ` [PATCH v3 03/18] xfs: sparse inode chunks feature helpers and mount requirements Brian Foster
2015-02-06 22:54   ` Dave Chinner
2015-02-06 19:52 ` [PATCH v3 04/18] xfs: introduce inode record hole mask for sparse inode chunks Brian Foster
2015-02-06 23:16   ` Dave Chinner
2015-02-08 16:06     ` Brian Foster
2015-02-08 21:57       ` Dave Chinner
2015-02-09 15:15         ` Brian Foster
2015-02-09 21:48           ` Dave Chinner
2015-02-10 12:58             ` Brian Foster [this message]
2015-02-06 19:52 ` [PATCH v3 05/18] xfs: create macros/helpers for dealing with " Brian Foster
2015-02-06 19:52 ` [PATCH v3 06/18] xfs: pass inode count through ordered icreate log item Brian Foster
2015-02-06 19:52 ` [PATCH v3 07/18] xfs: handle sparse inode chunks in icreate log recovery Brian Foster
2015-02-06 23:19   ` Dave Chinner
2015-02-08 16:06     ` Brian Foster
2015-02-06 19:52 ` [PATCH v3 08/18] xfs: helpers to convert holemask to/from generic bitmap Brian Foster
2015-02-08 23:54   ` Dave Chinner
2015-02-09 15:15     ` Brian Foster
2015-02-06 19:52 ` [PATCH v3 09/18] xfs: support min/max agbno args in block allocator Brian Foster
2015-02-09  0:01   ` Dave Chinner
2015-02-06 19:52 ` [PATCH v3 10/18] xfs: allocate sparse inode chunks on full chunk allocation failure Brian Foster
2015-02-09  1:25   ` Dave Chinner
2015-02-09 15:20     ` Brian Foster
2015-02-06 19:52 ` [PATCH v3 11/18] xfs: randomly do sparse inode allocations in DEBUG mode Brian Foster
2015-02-08 23:02   ` Dave Chinner
2015-02-09 15:20     ` Brian Foster
2015-02-06 19:52 ` [PATCH v3 12/18] xfs: filter out sparse regions from individual inode allocation Brian Foster
2015-02-08 22:49   ` Dave Chinner
2015-02-06 19:53 ` [PATCH v3 13/18] xfs: update free inode record logic to support sparse inode records Brian Foster
2015-02-08 22:33   ` Dave Chinner
2015-02-06 19:53 ` [PATCH v3 14/18] xfs: only free allocated regions of inode chunks Brian Foster
2015-02-06 19:53 ` [PATCH v3 15/18] xfs: skip unallocated regions of inode chunks in xfs_ifree_cluster() Brian Foster
2015-02-06 19:53 ` [PATCH v3 16/18] xfs: use actual inode count for sparse records in bulkstat/inumbers Brian Foster
2015-02-08 22:29   ` Dave Chinner
2015-02-06 19:53 ` [PATCH v3 17/18] xfs: add fs geometry bit for sparse inode chunks Brian Foster
2015-02-06 19:53 ` [PATCH v3 18/18] xfs: enable sparse inode chunks for v5 superblocks Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150210125837.GA27661@laptop.bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox