public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 5/7] XFS: Make use of the init-once slab optimisation.
       [not found]   ` <20080814190001.GA19070@infradead.org>
@ 2008-08-18  0:19     ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2008-08-18  0:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Aug 14, 2008 at 03:00:01PM -0400, Christoph Hellwig wrote:
> On Thu, Aug 14, 2008 at 05:14:41PM +1000, Dave Chinner wrote:
> > To avoid having to initialise some fields of the XFS inode
> > on every allocation, we can use the slab init-once feature
> > to initialise them. All we have to guarantee is that when
> > we free the inode, all it's entries are in the initial state.
> > Add asserts where possible to ensure debug kernels check this
> > initial state before freeing and after allocation.
> 
> Looks good, and I think it this can be moved before the series and
> submitted ASAP.

I'll reorder it....

> > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> > index cf6754a..4f4c939 100644
> > --- a/fs/xfs/xfs_itable.c
> > +++ b/fs/xfs/xfs_itable.c
> > @@ -594,21 +594,21 @@ xfs_bulkstat(
> >  						/*
> >  						 * Get the inode cluster buffer
> >  						 */
> > -						ASSERT(xfs_inode_zone != NULL);
> > -						ip = kmem_zone_zalloc(xfs_inode_zone,
> > -								      KM_SLEEP);
> > -						ip->i_ino = ino;
> > -						ip->i_mount = mp;
> > -						spin_lock_init(&ip->i_flags_lock);
> >  						if (bp)
> >  							xfs_buf_relse(bp);
> > +						ip = xfs_inode_alloc(mp, ino);
> > +						if (!ip) {
> > +							bp = NULL;
> > +							rval = ENOMEM;
> > +							break;
> > +						}
> >  						error = xfs_itobp(mp, NULL, ip,
> >  								&dip, &bp, bno,
> >  								XFS_IMAP_BULKSTAT,
> >  								XFS_BUF_LOCK);
> 
> Yikes, what a mess - eventually we should convert this to opencoded cals
> to xfs_imap and xfs_imap_to_bp, but before the function needs to be
> split into manageable chunk.  Another item on the todo list..

Yeah, bulkstat is an utter, utter mess. There's a lot of work needed
to clean it up...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 4/7] XFS: Never call mark_inode_dirty_sync() directly
       [not found]   ` <20080814194702.GB12237@infradead.org>
@ 2008-08-18  0:19     ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2008-08-18  0:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Aug 14, 2008 at 03:47:02PM -0400, Christoph Hellwig wrote:
> On Thu, Aug 14, 2008 at 05:14:40PM +1000, Dave Chinner wrote:
> > Once the Linux inode and the XFS inode are combined, we cannot rely
> > on just check if the linux inode exists as a method of determining
> > if it is valid or not. Hence we should always call
> > xfs_mark_inode_dirty_sync() instead as it does the correct checks to
> > determine if the liinux inode is in a valid state or not.
> 
> Makes sense, and another candidate to put at the beginning of the series
> or rather just put in now.

Ok. I'll reorder it...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 6/7] XFS: Combine the XFS and Linux inodes.
       [not found]   ` <20080814200006.GC12237@infradead.org>
@ 2008-08-18  0:34     ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2008-08-18  0:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Aug 14, 2008 at 04:00:06PM -0400, Christoph Hellwig wrote:
> > +	if (!(inode->i_state & I_CLEAR)) {
> >  		ip->i_d.di_atime.t_sec = (__int32_t)inode->i_atime.tv_sec;
> >  		ip->i_d.di_atime.t_nsec = (__int32_t)inode->i_atime.tv_nsec;
> >  	}
> 
> Actually we can just do this unconditionally, as the atime values don't
> become invalid just because the VFS doesn't know about the inode
> anymore.  And two useless because already previously updated assignments
> are cheaper than a branch.

Ok, I'll make it unconditional.

> > + * We are always called with an uninitialised linux inode here.
> > + * We need to initialise the necessary fields and take a reference
> > + * on it.
> >   */
> >  void
> >  xfs_setup_inode(
> >  	struct xfs_inode	*ip)
> >  {
> > -	struct inode		*inode = ip->i_vnode;
> > +	struct inode		*inode = &ip->i_vnode;
> 
> VFS_I?

Yup, I thought I caught all of them :/

> 
> > +	inode->i_ino = ip->i_ino;
> > +	inode->i_state = I_NEW|I_LOCK;
> > +	inode_used(ip->i_mount->m_super, inode);
> > +	ASSERT(atomic_read(&inode->i_count) == 1);
> 
> Where does inode_used come from?  (It's also a rather ugly name..)

Ah, separate patch not sent. basically does:

void inode_used(struct super_block *sb, struct inode *inode)
{
       spin_lock(&inode_lock);
       inodes_stat.nr_inodes++;
       list_add(&inode->i_list, &inode_in_use);
       list_add(&inode->i_sb_list, &sb->s_inodes);
       spin_unlock(&inode_lock);
}

> > +/* XXX: development debug only */
> >  STATIC struct inode *
> >  xfs_fs_alloc_inode(
> >  	struct super_block	*sb)
> >  {
> > -	return kmem_zone_alloc(xfs_vnode_zone, KM_SLEEP);
> > +	BUG();
> >  }
> 
> Actually keeping this one is a good idea, even if it's just to catch
> really dumb things like the iget_locked OpenAFS's cache manager does on
> random filesystems..

Ok, I'll leave that in.

> > -void
> > -xfs_inode_init_once(
> > +STATIC void
> > +xfs_fs_inode_init_once(
> 
> might be a good idea to already use that name in the patch that
> introduces it :)

The problem was that we have two init functions at this point;
one for the vnode, one for the inode....

> >  /*
> > + * we need to provide an empty inode free function to prevent
> > + * the generic code from trying to free ouuur combined inode.
> > + */
> > +STATIC void
> > +xfs_fs_destroy_inode(
> > +	struct inode		*inode)
> > +{
> > +	return;
> > +}
> 
> Why it's this kept in the original place, close to alloc_inode?

Just the way things happened. I'll move it back.

> Also the return statement is superflous.

Yup.

> >  /* convert from xfs inode to vfs inode */
> >  static inline struct inode *VFS_I(struct xfs_inode *ip)
> >  {
> > -	return (struct inode *)ip->i_vnode;
> > +	return (struct inode *)&ip->i_vnode;
> >  }
> 
> No need for the cast in either version..

Ok, I'll kill it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 7/7] XFS: don't use vnodes where unnecessary
       [not found]   ` <20080814201022.GA20557@infradead.org>
@ 2008-08-18  0:42     ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2008-08-18  0:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Aug 14, 2008 at 04:10:22PM -0400, Christoph Hellwig wrote:
> >  	do {
> > +		struct inode	*inode;
> > +		boolean_t	vnode_refed;
> > +		xfs_inode_t	*ip = NULL;
> 
> This should not be inode_refed, no? :)

Yeah ;)

> > -		vp = VFS_I(ip);
> > -		if (!vp) {
> > +		if (xfs_iflags_test(ip, (XFS_IRECLAIM|XFS_IRECLAIMABLE))) {
> 
> These changes really need to be folded into the previous patch for
> bisectability reasons..

Right - this was catching all this bits that I missed in the
previous patch ;)

> 
> > +		inode = VFS_I(ip);
> > +		if (VN_BAD(inode)) {
> 
> Just use is_bad_inode directly.  (Also in a few other places)

Will do.

> >  		} else {
> > -			/* safe to unlock here as we have a reference */
> > +			if (!xfs_ilock_nowait(ip, lock_flags)) {
> > +				/* leave it to reclaim */
> > +				read_unlock(&pag->pag_ici_lock);
> > +				continue;
> > +			}
> >  			read_unlock(&pag->pag_ici_lock);
> >  		}
> 
> Maybe not for this patch, but in general, why do we even bother about
> the inodes we can't get a reference to, shouldn't we just always leave
> this to reclaim?

Right now, we either get a reference or an ilock to prevent reclaim
from freeing it. The case ofthe vnode disappearing made it difficult
to make a clear delineation. With the combined inode, it is much
easier to make this distinction, but if we are doing ascending order
writeback of inodes, we really want to get all the inodes that are
dirty regardless of whether they are in reclaim or not.

> > -		if ((flags & SYNC_DELWRI) && VN_DIRTY(vp)) {
> > +		if ((flags & SYNC_DELWRI) && VN_DIRTY(VFS_I(ip))) {
> 
> Why not use the "inode" variable we have in scope anyway?

Yup, should do that.

> > +	if (!(inode->i_state & I_CLEAR))
> > +		return atomic_read(&inode->i_count);
> 
> Well, it's just zero when the inode is out of the vfs, so we could
> probably simply do it unconditional.

Ok, I'll have a look at that.

> 
> > +		vp = vn_grab(VFS_I(ip));
> > +		if (vp) {
> 
> Please switch this one to igrab, and the inode name, too.  As this
> is the last caller of vn_grab we can kill it now.

Ok, I'll fix that one up, too.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/7] RFC: combine linux and XFS inodes
       [not found] ` <20080814194550.GA12237@infradead.org>
@ 2008-08-18  1:13   ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2008-08-18  1:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Aug 14, 2008 at 03:45:50PM -0400, Christoph Hellwig wrote:
> On Thu, Aug 14, 2008 at 05:14:36PM +1000, Dave Chinner wrote:
> > However, this means the linux inodes are unhashed, which means we
> > now need to do our own tracking of dirty state. We do this by
> > hooking ->dirty_inode and ->set_page_dirty to move the inode to the
> > superblock dirty list when appropriate. We also need to hook
> > ->drop_inode to ensure we do writeback of dirty inodes during
> > reclaim. In future, this can be moved entirely into XFS based on
> > radix tree tags to track dirty inodes instead of a separate list.
> 
> This part (patches 1, 2 and 3) is horrible, and I think avoidable.

Yeah, it's not pretty in it's initial form.

> We can just insert the inode into the Linux inode hash anyway, even
> if we never use it later.

Ok, that will avoid the writeback bits, However it doesn't avoid
the need for these hooks - the next 3 patches after this add dirty
tagging to the inode radix trees via ->dirty_inode and
->set_page_dirty, then use that for inode writeback clustering.

> That avoids these whole three patches
> and all the duplication of core code inside XFS, including the
> inode_lock issue and the potential problem of getting out of sync
> when the core code is updated.

The adding of the inode to the superblock dirty lists is only
temporary - with the tracking of dirty inodes in the radix trees
we can clean inodes much more effectively ourselves than pdflush
can because we know what are optimal write patterns and pdflush
doesn't...

As for cleaning the inode in ->drop_inode, I was planning on
letting reclaim handle the dirty inode case for both linked and
unlinked inodes so that we can batch the data and inode writeback
and move it out of the direct VM reclaim path. That is, allow the
shrinker simply to mark inodes for reclaim, then allow XFS to batch
the work as efficiently as possible in the background...

> If you really, really want to avoid inserting the inode into the Linux
> inode cache (and that would need a sound reason) the way to go would
> be to remove the assumptions of no writeback for unhashed inodes form
> the core code and replace it with a flag that's normally set/cleared
> during hashing/unhashing but could also be set/cleared from XFS.

As I mentioned above, I'm looking to remove the writeback of inodes
almost completely out of the VFS hands and tightly integrate it into
the internal structures of XFS.

e.g. to avoid problems like synchronous RMW cycles on inode cluster
buffers we need to move to a multi-stage writeback infrastructure
that the VFS simply cannot support at the moment. I'd like to get
that structure in place before considering promoting it at the VFS
level. Basically we need:

	pass 1: collect inodes to be written
	pass 2: extract inodes with data and sort into optimal data
		writeback order, issue data async data writes
	pass 3: issue async readahead and pin all inode cluster
		buffers to be written in memory.
	pass 4: if sync flush, wait for all data writeback to
		complete. Force the log (async) to unpin all inodes
		that allocations have been done on.
	pass 5: write all inodes back to buffers and issue async.
	pass 6: if sync flush, wait for inode writeback to complete.

And of course, this can be done in parallel across multiple AGs at
once.  With dirty tagging in the radix tree we have all the
collection, sorting and parallel access infrstrastructure we need
already in place....

FWIW, the inode sort and cluster readahead pass can make 3-4 orders
of magnitude difference in inode writeback speeds under workloads
that span a large number of files on systems with limited memory
(think typical NFS servers).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-08-18  1:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1218698083-11226-1-git-send-email-david@fromorbit.com>
     [not found] ` <1218698083-11226-6-git-send-email-david@fromorbit.com>
     [not found]   ` <20080814190001.GA19070@infradead.org>
2008-08-18  0:19     ` [PATCH 5/7] XFS: Make use of the init-once slab optimisation Dave Chinner
     [not found] ` <1218698083-11226-5-git-send-email-david@fromorbit.com>
     [not found]   ` <20080814194702.GB12237@infradead.org>
2008-08-18  0:19     ` [PATCH 4/7] XFS: Never call mark_inode_dirty_sync() directly Dave Chinner
     [not found] ` <1218698083-11226-7-git-send-email-david@fromorbit.com>
     [not found]   ` <20080814200006.GC12237@infradead.org>
2008-08-18  0:34     ` [PATCH 6/7] XFS: Combine the XFS and Linux inodes Dave Chinner
     [not found] ` <1218698083-11226-8-git-send-email-david@fromorbit.com>
     [not found]   ` <20080814201022.GA20557@infradead.org>
2008-08-18  0:42     ` [PATCH 7/7] XFS: don't use vnodes where unnecessary Dave Chinner
     [not found] ` <20080814194550.GA12237@infradead.org>
2008-08-18  1:13   ` [PATCH 0/7] RFC: combine linux and XFS inodes Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox