From: Ben Myers <bpm@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 10/10] xfs: don't cache inodes read through bulkstat
Date: Wed, 14 Mar 2012 15:44:01 -0500 [thread overview]
Message-ID: <20120314204401.GN7762@sgi.com> (raw)
In-Reply-To: <1331095828-28742-11-git-send-email-david@fromorbit.com>
On Wed, Mar 07, 2012 at 03:50:28PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> When we read inodes via bulkstat, we generally only read them once
> and then throw them away - they never get used again. If we retain
> them in cache, then it simply causes the working set of inodes and
> other cached items to be reclaimed just so the inode cache can grow.
>
> Avoid this problem by marking inodes read by bulkstat as not to be
> cached and check this flag in .drop_inode to determine whether the
> inode should be added to the VFS LRU or not. If the inode lookup
> hits an already cached inode, then don't set the flag. If the inode
> lookup hits an inode marked with no cache flag, remove the flag and
> allow it to be cached once the current reference goes away.
>
> Inodes marked as not cached will get cleaned up by the background
> inode reclaim or via memory pressure, so they will still generate
> some short term cache pressure. They will, however, be reclaimed
> much sooner and in preference to cache hot inodes.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Looks good.
Reviewed-by: Ben Myers <bpm@sgi.com>
> ---
> fs/xfs/xfs_iget.c | 8 ++++++--
> fs/xfs/xfs_inode.h | 4 +++-
> fs/xfs/xfs_itable.c | 3 ++-
> fs/xfs/xfs_super.c | 17 +++++++++++++++++
> 4 files changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
> index 93fc1dc..20ddb1e 100644
> --- a/fs/xfs/xfs_iget.c
> +++ b/fs/xfs/xfs_iget.c
> @@ -290,7 +290,7 @@ xfs_iget_cache_hit(
> if (lock_flags != 0)
> xfs_ilock(ip, lock_flags);
>
> - xfs_iflags_clear(ip, XFS_ISTALE);
> + xfs_iflags_clear(ip, XFS_ISTALE | XFS_IDONTCACHE);
> XFS_STATS_INC(xs_ig_found);
>
> return 0;
> @@ -315,6 +315,7 @@ xfs_iget_cache_miss(
> struct xfs_inode *ip;
> int error;
> xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ino);
> + int iflags;
>
> ip = xfs_inode_alloc(mp, ino);
> if (!ip)
> @@ -359,8 +360,11 @@ xfs_iget_cache_miss(
> * memory barrier that ensures this detection works correctly at lookup
> * time.
> */
> + iflags = XFS_INEW;
> + if (flags & XFS_IGET_DONTCACHE)
> + iflags |= XFS_IDONTCACHE;
> ip->i_udquot = ip->i_gdquot = NULL;
> - xfs_iflags_set(ip, XFS_INEW);
> + xfs_iflags_set(ip, iflags);
>
> /* insert the new inode */
> spin_lock(&pag->pag_ici_lock);
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index eda4937..096b887 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -374,10 +374,11 @@ xfs_set_projid(struct xfs_inode *ip,
> #define XFS_IFLOCK (1 << __XFS_IFLOCK_BIT)
> #define __XFS_IPINNED_BIT 8 /* wakeup key for zero pin count */
> #define XFS_IPINNED (1 << __XFS_IPINNED_BIT)
> +#define XFS_IDONTCACHE (1 << 9) /* don't cache the inode long term */
>
> /*
> * Per-lifetime flags need to be reset when re-using a reclaimable inode during
> - * inode lookup. Thi prevents unintended behaviour on the new inode from
> + * inode lookup. This prevents unintended behaviour on the new inode from
> * ocurring.
> */
> #define XFS_IRECLAIM_RESET_FLAGS \
> @@ -544,6 +545,7 @@ do { \
> */
> #define XFS_IGET_CREATE 0x1
> #define XFS_IGET_UNTRUSTED 0x2
> +#define XFS_IGET_DONTCACHE 0x4
>
> int xfs_inotobp(struct xfs_mount *, struct xfs_trans *,
> xfs_ino_t, struct xfs_dinode **,
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index 751e94f..b832c58 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -76,7 +76,8 @@ xfs_bulkstat_one_int(
> return XFS_ERROR(ENOMEM);
>
> error = xfs_iget(mp, NULL, ino,
> - XFS_IGET_UNTRUSTED, XFS_ILOCK_SHARED, &ip);
> + (XFS_IGET_DONTCACHE | XFS_IGET_UNTRUSTED),
> + XFS_ILOCK_SHARED, &ip);
> if (error) {
> *stat = BULKSTAT_RV_NOTHING;
> goto out_free;
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index b1df512..c162765 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -953,6 +953,22 @@ xfs_fs_evict_inode(
> xfs_inactive(ip);
> }
>
> +/*
> + * We do an unlocked check for XFS_IDONTCACHE here because we are already
> + * serialised against cache hits here via the inode->i_lock and igrab() in
> + * xfs_iget_cache_hit(). Hence a lookup that might clear this flag will not be
> + * racing with us, and it avoids needing to grab a spinlock here for every inode
> + * we drop the final reference on.
> + */
I'll try to put this in my own words, just in case it is mystifying for
anyone else. ;)
In this case it is ok to do check of ip->i_flags without holding
inode->i_flags_lock because... we have exclusion from xfs_iget_cache_hit
as follows:
The 'dropper' would have taken inode->i_lock when the inode's count went
to zero, and if the XFS_IDONTCARE flag is set, dropper will return 1 to
iput_final which will result in iput_final skipping the inode lru and
setting I_FREEING immediately, before droppig inode->i_lock and evicting
the inode.
A 'cache hitter' must call igrab in order to get a reference on the
inode. igrab takes the inode->i_lock, and if I_FREEING is set, it
returns NULL, then xfs_iget_cache_hit returns EAGAIN, and is restarted.
So... any 'cache hitter' who could possibly clear the XFS_IDONTCACHE
flag subsequent to 'dropper' checking it would always be unable to get a
reference due to I_FREEING having been set by the dropper.
I appreciate that you added the comment.
Regards,
Ben
> +STATIC int
> +xfs_fs_drop_inode(
> + struct inode *inode)
> +{
> + struct xfs_inode *ip = XFS_I(inode);
> +
> + return generic_drop_inode(inode) || (ip->i_flags & XFS_IDONTCACHE);
> +}
> +
> STATIC void
> xfs_free_fsname(
> struct xfs_mount *mp)
> @@ -1431,6 +1447,7 @@ static const struct super_operations xfs_super_operations = {
> .dirty_inode = xfs_fs_dirty_inode,
> .write_inode = xfs_fs_write_inode,
> .evict_inode = xfs_fs_evict_inode,
> + .drop_inode = xfs_fs_drop_inode,
> .put_super = xfs_fs_put_super,
> .sync_fs = xfs_fs_sync_fs,
> .freeze_fs = xfs_fs_freeze,
> --
> 1.7.9
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-03-14 20:43 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-07 4:50 [PATCH 0/10] xfs: various fixes v2 Dave Chinner
2012-03-07 4:50 ` [PATCH 01/10] xfs: clean up minor sparse warnings Dave Chinner
2012-03-08 21:34 ` Ben Myers
2012-03-09 0:30 ` Dave Chinner
2012-03-07 4:50 ` [PATCH 02/10] xfs: Fix open flag handling in open_by_handle code Dave Chinner
2012-03-12 13:27 ` Christoph Hellwig
2012-03-13 21:15 ` Mark Tinguely
2012-03-07 4:50 ` [PATCH 03/10] xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get Dave Chinner
2012-03-12 13:27 ` Christoph Hellwig
2012-03-14 18:04 ` Mark Tinguely
2012-03-07 4:50 ` [PATCH 04/10] xfs: fallback to vmalloc for large buffers in xfs_getbmap Dave Chinner
2012-03-12 13:28 ` Christoph Hellwig
2012-03-14 18:12 ` Mark Tinguely
2012-03-07 4:50 ` [PATCH 05/10] xfs: introduce an allocation workqueue Dave Chinner
2012-03-12 16:16 ` Christoph Hellwig
2012-03-19 16:47 ` Mark Tinguely
2012-03-19 22:20 ` Dave Chinner
2012-03-20 16:34 ` Mark Tinguely
2012-03-20 22:45 ` Dave Chinner
2012-03-07 4:50 ` [PATCH 06/10] xfs: remove remaining scraps of struct xfs_iomap Dave Chinner
2012-03-15 16:48 ` Mark Tinguely
2012-03-07 4:50 ` [PATCH 07/10] xfs: fix inode lookup race Dave Chinner
2012-03-07 4:50 ` [PATCH 08/10] xfs: initialise xfssync work before running quotachecks Dave Chinner
2012-03-12 13:28 ` Christoph Hellwig
2012-03-16 17:07 ` Mark Tinguely
2012-03-07 4:50 ` [PATCH 09/10] xfs: remove MS_ACTIVE guard from inode reclaim work Dave Chinner
2012-03-12 13:30 ` Christoph Hellwig
2012-03-07 4:50 ` [PATCH 10/10] xfs: don't cache inodes read through bulkstat Dave Chinner
2012-03-12 13:31 ` Christoph Hellwig
2012-03-14 20:44 ` Ben Myers [this message]
2012-03-15 18:14 ` Ben Myers
2012-03-15 22:05 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120314204401.GN7762@sgi.com \
--to=bpm@sgi.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.