public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: ***** SUSPECTED SPAM ***** [RFD 14/17] xfs: separate inode freeing from inactivation
Date: Mon, 12 Aug 2013 23:20:04 +1000	[thread overview]
Message-ID: <1376313607-28133-15-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1376313607-28133-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

Inode freeing and unlinked list processing is done as part of the
inactivation transaction when the last reference goes away from the
VFS inode. While it is advantageous to truncate away all the extents
allocated to the inode at this point, it is not necesarily in our
best interests to free the inode immediately.

While the inode is on the unlinked list and there are no more VFS
references to the inode, it is effectively a free inode - the
unlinked list reference tells us this rather than the inode btree
marking the inode free.

If we separate the actual freeing of the inode from the VFS
references, we have an inode that we can reallocate for use without
needing to pass it through the inode allocation btree. That is, we
can allocate directly from the unlinked list in the AG. We already
have the ability to do this for the O_TMPFILE/linkat(2) case where
we allocate directly to the unlinked list and then later link the
referenced inode to a directory and remove it from the unlinked
list.

In this case, if we have an unreferenced inode on the unlinked list,
we can allocate it directly simply by removing it from the unlinked
list. Further, O_TMPFILE allocations can be made effectively without
any transactions being issued at all if there are already free,
unreferenced inodes on the unlinked list.

Hence we need a method of finding inodes that are unreferenced but
on the unlinked list availble for allocation. A simple method for
doing this is using a inode cache radix tree tag on the inodes that
are unlinked and unreferenced but still on the unlinked list. A
simple tag check can tell us if there are any available for this
method of allocation, so there's no overhead to determine what
method to use.

Further, by using a radix tree tag we can use an inode cache
iterator function to run a periodic worker to remove inodes from the
unlinked list and mark them free in the inode btree. This the
advantage of doing the inode freeing in the background is that we do
not have to worry about how quickly we can remove inodes from the
unlinked list as it is not longer in the fast path. This enables us
to use trylock semantics for freeing the inodes and so we can skip
inodes we'd otherwise block on.

Alternatively, we can use the presence of the radix tree tag to
indicate that we need to walk the unlinked inode lists freeing
inodes from them. This may seem appealing until we realise that each
inode on a unlinked list belongs to a different inode chunk due
to the hashing function used. Hence every inode we free will modify
different btree record and so there is no locality of modification
in the inode btree structures and inode backing buffers.

If we use a radix tree walk, we will process all the free inodes in
a chunk and hence keep good CPU cache locality for all the data
structures that we need to modify for freeing those inodes. This
will be more CPU efficient as the data cache footprint of the walk
will be much smaller and hence we'll stall the CPU a lot less
waiting for cache lines to be loaded from memory.

This background freeing process allows us to make further changes to
the unlinked lists that avoid unsolvable deadlocks. For example, if
we cannot lock inodes on the unlinked list, we can simply have the
freeing of the inode retried again at some point in the future
automatically.

Finally, we need an inode flag to indicate that the inode is in this
special unlinked, unreferenced state when lockless cache lookups are
done. This ensures that we can safely avoid these inodes as lookup
circumstances allow and work correctly with the inode reclaim state
machine. e.g. for allocaiton optimisations, we want to be able to
find these inodes, but for all other lookups we want an ENOENT to be
returned.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_vnodeops.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index dc730ac..db712fb 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -374,6 +374,8 @@ xfs_inactive(
 
 	ASSERT(ip->i_d.di_anextents == 0);
 
+	/* this is where we need to split inactivation and inode freeing */
+
 	/*
 	 * Free the inode.
 	 */
-- 
1.8.3.2

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2013-08-12 13:20 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-12 13:19 ***** SUSPECTED SPAM ***** [RFD 00/17] xfs: inode management development direction Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 01/17] xfs: inode allocation tickets Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 02/17] xfs: separate inode chunk allocation from free inode allocation Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 03/17] xfs: move inode chunk allocation into a workqueue Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 04/17] xfs: optimise background inode chunk allocation Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 05/17] xfs: introduce a free inode allocation btree Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 06/17] xfs: partial inode chunk allocation Dave Chinner
2013-08-13 22:07   ` Brian Foster
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 07/17] xfs: separate inode chunk freeing from inode freeing Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 08/17] xfs: inode chunk freeing in the background Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 09/17] xfs: optimise inode chunk freeing Dave Chinner
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 10/17] xfs: swap extents operations for CRC filesystems Dave Chinner
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 11/17] xfs: factor xfs_create to prepare for O_TMPFILE Dave Chinner
2013-08-20  8:16   ` Zhi Yong Wu
2013-11-06 11:20     ` Christoph Hellwig
2013-11-06 11:21   ` Christoph Hellwig
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 12/17] xfs: add tmpfile methods Dave Chinner
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 13/17] xfs: allow linkat() on O_TMPFILE files Dave Chinner
2013-08-12 13:20 ` Dave Chinner [this message]
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 15/17] xfs: introduce a method vector for unlinked list operations Dave Chinner
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 16/17] xfs: add in-core unlinked list for v3 inodes Dave Chinner
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 17/17] xfs: log unlinked list modifications in the incore v3 inode Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1376313607-28133-15-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox