* [PATCH 0/6] XFS: Track reclaimable inodes in inode cache.
@ 2008-09-13 14:14 Dave Chinner
0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2008-09-13 14:14 UTC (permalink / raw)
To: xfs
Move the tracking of reclaimable inodes into the inode radix trees.
This currently does not replace the reclaim flags in the inode,
rather it allows traversal of all reclaimable inodes by walking the
per-AG inode radix trees without needing a separate list. This
enables us to remove a struct list_head from the struct xfs_inode
and the xfs_mount, as well as a filesystem global lock which also
has the benefit of removing a point of serialisation during inode
reclaim.
Like the matching sync code, this also allows reclaim of inodes
in ascending inode numbers which substantially improves I/O
patterns during reclaim driven inode flushing.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/6] XFS: Track reclaimable inodes in inode cache
@ 2008-10-07 21:54 Dave Chinner
2008-10-07 21:54 ` [PATCH 1/6] XFS: move inode reclaim functions to xfs_sync.c Dave Chinner
` (5 more replies)
0 siblings, 6 replies; 10+ messages in thread
From: Dave Chinner @ 2008-10-07 21:54 UTC (permalink / raw)
To: xfs
Move the tracking of reclaimable inodes into the inode radix trees.
This currently does not replace the reclaim flags in the inode,
rather it allows traversal of all reclaimable inodes by walking the
per-AG inode radix trees without needing a separate list. This
enables us to remove a struct list_head from the struct xfs_inode
and the xfs_mount, as well as a filesystem global lock which also
has the benefit of removing a point of serialisation during inode
reclaim.
Like the matching sync code, this also allows reclaim of inodes
in ascending inode numbers which substantially improves I/O
patterns during reclaim driven inode flushing.
Version 2:
o clean up series based on review comments
o added bug fix to prevent looping when we overflow the AG
inode number index (fixes sync code as well).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/6] XFS: move inode reclaim functions to xfs_sync.c
2008-10-07 21:54 [PATCH 0/6] XFS: Track reclaimable inodes in inode cache Dave Chinner
@ 2008-10-07 21:54 ` Dave Chinner
2008-10-07 21:54 ` [PATCH 2/6] XFS: rename inode reclaim functions Dave Chinner
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2008-10-07 21:54 UTC (permalink / raw)
To: xfs
Background inode reclaim is run by the xfssyncd. Move the
reclaim worker functions to be close to the sync code as
the are very similar in structure and are both run from
the same background thread.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/linux-2.6/xfs_sync.c | 91 +++++++++++++++++++++++++++++++++++++++++++
fs/xfs/linux-2.6/xfs_sync.h | 3 +
fs/xfs/xfs_inode.h | 2 -
fs/xfs/xfs_vnodeops.c | 90 ------------------------------------------
4 files changed, 94 insertions(+), 92 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index b2b7082..79038ea 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -583,3 +583,94 @@ xfs_syncd_stop(
kthread_stop(mp->m_sync_task);
}
+int
+xfs_finish_reclaim(
+ xfs_inode_t *ip,
+ int locked,
+ int sync_mode)
+{
+ xfs_perag_t *pag = xfs_get_perag(ip->i_mount, ip->i_ino);
+
+ /* The hash lock here protects a thread in xfs_iget_core from
+ * racing with us on linking the inode back with a vnode.
+ * Once we have the XFS_IRECLAIM flag set it will not touch
+ * us.
+ */
+ write_lock(&pag->pag_ici_lock);
+ spin_lock(&ip->i_flags_lock);
+ if (__xfs_iflags_test(ip, XFS_IRECLAIM) ||
+ !__xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
+ spin_unlock(&ip->i_flags_lock);
+ write_unlock(&pag->pag_ici_lock);
+ if (locked) {
+ xfs_ifunlock(ip);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ }
+ return 1;
+ }
+ __xfs_iflags_set(ip, XFS_IRECLAIM);
+ spin_unlock(&ip->i_flags_lock);
+ write_unlock(&pag->pag_ici_lock);
+ xfs_put_perag(ip->i_mount, pag);
+
+ /*
+ * If the inode is still dirty, then flush it out. If the inode
+ * is not in the AIL, then it will be OK to flush it delwri as
+ * long as xfs_iflush() does not keep any references to the inode.
+ * We leave that decision up to xfs_iflush() since it has the
+ * knowledge of whether it's OK to simply do a delwri flush of
+ * the inode or whether we need to wait until the inode is
+ * pulled from the AIL.
+ * We get the flush lock regardless, though, just to make sure
+ * we don't free it while it is being flushed.
+ */
+ if (!locked) {
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_iflock(ip);
+ }
+
+ /*
+ * In the case of a forced shutdown we rely on xfs_iflush() to
+ * wait for the inode to be unpinned before returning an error.
+ */
+ if (!is_bad_inode(VFS_I(ip)) && xfs_iflush(ip, sync_mode) == 0) {
+ /* synchronize with xfs_iflush_done */
+ xfs_iflock(ip);
+ xfs_ifunlock(ip);
+ }
+
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ xfs_ireclaim(ip);
+ return 0;
+}
+
+int
+xfs_finish_reclaim_all(
+ xfs_mount_t *mp,
+ int noblock,
+ int mode)
+{
+ xfs_inode_t *ip, *n;
+
+restart:
+ XFS_MOUNT_ILOCK(mp);
+ list_for_each_entry_safe(ip, n, &mp->m_del_inodes, i_reclaim) {
+ if (noblock) {
+ if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0)
+ continue;
+ if (xfs_ipincount(ip) ||
+ !xfs_iflock_nowait(ip)) {
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ continue;
+ }
+ }
+ XFS_MOUNT_IUNLOCK(mp);
+ if (xfs_finish_reclaim(ip, noblock, mode))
+ delay(1);
+ goto restart;
+ }
+ XFS_MOUNT_IUNLOCK(mp);
+ return 0;
+}
+
+
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index 3b49aa3..23117a1 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -45,4 +45,7 @@ void xfs_quiesce_attr(struct xfs_mount *mp);
void xfs_flush_inode(struct xfs_inode *ip);
void xfs_flush_device(struct xfs_inode *ip);
+int xfs_finish_reclaim(struct xfs_inode *ip, int locked, int sync_mode);
+int xfs_finish_reclaim_all(struct xfs_mount *mp, int noblock, int mode);
+
#endif
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index ab70cf3..f946b13 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -496,8 +496,6 @@ int xfs_isilocked(xfs_inode_t *, uint);
uint xfs_ilock_map_shared(xfs_inode_t *);
void xfs_iunlock_map_shared(xfs_inode_t *, uint);
void xfs_ireclaim(xfs_inode_t *);
-int xfs_finish_reclaim(xfs_inode_t *, int, int);
-int xfs_finish_reclaim_all(struct xfs_mount *, int, int);
/*
* xfs_inode.c prototypes.
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 8bd8f19..f6d016e 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2824,96 +2824,6 @@ xfs_reclaim(
return 0;
}
-int
-xfs_finish_reclaim(
- xfs_inode_t *ip,
- int locked,
- int sync_mode)
-{
- xfs_perag_t *pag = xfs_get_perag(ip->i_mount, ip->i_ino);
-
- /* The hash lock here protects a thread in xfs_iget_core from
- * racing with us on linking the inode back with a vnode.
- * Once we have the XFS_IRECLAIM flag set it will not touch
- * us.
- */
- write_lock(&pag->pag_ici_lock);
- spin_lock(&ip->i_flags_lock);
- if (__xfs_iflags_test(ip, XFS_IRECLAIM) ||
- !__xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
- spin_unlock(&ip->i_flags_lock);
- write_unlock(&pag->pag_ici_lock);
- if (locked) {
- xfs_ifunlock(ip);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- }
- return 1;
- }
- __xfs_iflags_set(ip, XFS_IRECLAIM);
- spin_unlock(&ip->i_flags_lock);
- write_unlock(&pag->pag_ici_lock);
- xfs_put_perag(ip->i_mount, pag);
-
- /*
- * If the inode is still dirty, then flush it out. If the inode
- * is not in the AIL, then it will be OK to flush it delwri as
- * long as xfs_iflush() does not keep any references to the inode.
- * We leave that decision up to xfs_iflush() since it has the
- * knowledge of whether it's OK to simply do a delwri flush of
- * the inode or whether we need to wait until the inode is
- * pulled from the AIL.
- * We get the flush lock regardless, though, just to make sure
- * we don't free it while it is being flushed.
- */
- if (!locked) {
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- xfs_iflock(ip);
- }
-
- /*
- * In the case of a forced shutdown we rely on xfs_iflush() to
- * wait for the inode to be unpinned before returning an error.
- */
- if (!is_bad_inode(VFS_I(ip)) && xfs_iflush(ip, sync_mode) == 0) {
- /* synchronize with xfs_iflush_done */
- xfs_iflock(ip);
- xfs_ifunlock(ip);
- }
-
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- xfs_ireclaim(ip);
- return 0;
-}
-
-int
-xfs_finish_reclaim_all(
- xfs_mount_t *mp,
- int noblock,
- int mode)
-{
- xfs_inode_t *ip, *n;
-
-restart:
- XFS_MOUNT_ILOCK(mp);
- list_for_each_entry_safe(ip, n, &mp->m_del_inodes, i_reclaim) {
- if (noblock) {
- if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0)
- continue;
- if (xfs_ipincount(ip) ||
- !xfs_iflock_nowait(ip)) {
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- continue;
- }
- }
- XFS_MOUNT_IUNLOCK(mp);
- if (xfs_finish_reclaim(ip, noblock, mode))
- delay(1);
- goto restart;
- }
- XFS_MOUNT_IUNLOCK(mp);
- return 0;
-}
-
/*
* xfs_alloc_file_space()
* This routine allocates disk space for the given file.
--
1.5.6.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/6] XFS: rename inode reclaim functions
2008-10-07 21:54 [PATCH 0/6] XFS: Track reclaimable inodes in inode cache Dave Chinner
2008-10-07 21:54 ` [PATCH 1/6] XFS: move inode reclaim functions to xfs_sync.c Dave Chinner
@ 2008-10-07 21:54 ` Dave Chinner
2008-10-07 21:54 ` [PATCH 3/6] XFS: mark inodes for reclaim via a tag in the inode radix tree Dave Chinner
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2008-10-07 21:54 UTC (permalink / raw)
To: xfs
The function names xfs_finish_reclaim and xfs_finish_reclaim_all
are not very descriptive of what they are reclaiming. Rename to
xfs_reclaim_inode[s] to match the xfs_sync_inodes() function.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/linux-2.6/xfs_sync.c | 10 +++++-----
fs/xfs/linux-2.6/xfs_sync.h | 4 ++--
fs/xfs/xfs_mount.c | 2 +-
fs/xfs/xfs_vnodeops.c | 2 +-
4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 79038ea..34413ce 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -364,7 +364,7 @@ xfs_quiesce_fs(
int count = 0, pincount;
xfs_flush_buftarg(mp->m_ddev_targp, 0);
- xfs_finish_reclaim_all(mp, 0, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
+ xfs_reclaim_inodes(mp, 0, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
/*
* This loop must run at least twice. The first instance of the loop
@@ -505,7 +505,7 @@ xfs_sync_worker(
if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE);
- xfs_finish_reclaim_all(mp, 1, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
+ xfs_reclaim_inodes(mp, 0, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
/* dgc: errors ignored here */
error = XFS_QM_DQSYNC(mp, SYNC_BDFLUSH);
error = xfs_sync_fsdata(mp, SYNC_BDFLUSH);
@@ -584,7 +584,7 @@ xfs_syncd_stop(
}
int
-xfs_finish_reclaim(
+xfs_reclaim_inode(
xfs_inode_t *ip,
int locked,
int sync_mode)
@@ -645,7 +645,7 @@ xfs_finish_reclaim(
}
int
-xfs_finish_reclaim_all(
+xfs_reclaim_inodes(
xfs_mount_t *mp,
int noblock,
int mode)
@@ -665,7 +665,7 @@ restart:
}
}
XFS_MOUNT_IUNLOCK(mp);
- if (xfs_finish_reclaim(ip, noblock, mode))
+ if (xfs_reclaim_inode(ip, noblock, mode))
delay(1);
goto restart;
}
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index 23117a1..c1bcd50 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -45,7 +45,7 @@ void xfs_quiesce_attr(struct xfs_mount *mp);
void xfs_flush_inode(struct xfs_inode *ip);
void xfs_flush_device(struct xfs_inode *ip);
-int xfs_finish_reclaim(struct xfs_inode *ip, int locked, int sync_mode);
-int xfs_finish_reclaim_all(struct xfs_mount *mp, int noblock, int mode);
+int xfs_reclaim_inode(struct xfs_inode *ip, int locked, int sync_mode);
+int xfs_reclaim_inodes(struct xfs_mount *mp, int noblock, int mode);
#endif
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 1e10882..a230d8f 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1241,7 +1241,7 @@ xfs_unmountfs(
* need to force the log first.
*/
xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE | XFS_LOG_SYNC);
- xfs_finish_reclaim_all(mp, 0, XFS_IFLUSH_ASYNC);
+ xfs_reclaim_inodes(mp, 0, XFS_IFLUSH_ASYNC);
XFS_QM_DQPURGEALL(mp, XFS_QMOPT_QUOTALL | XFS_QMOPT_UMOUNTING);
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index f6d016e..1addcc7 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2809,7 +2809,7 @@ xfs_reclaim(
xfs_ilock(ip, XFS_ILOCK_EXCL);
xfs_iflock(ip);
xfs_iflags_set(ip, XFS_IRECLAIMABLE);
- return xfs_finish_reclaim(ip, 1, XFS_IFLUSH_DELWRI_ELSE_SYNC);
+ return xfs_reclaim_inode(ip, 1, XFS_IFLUSH_DELWRI_ELSE_SYNC);
} else {
xfs_mount_t *mp = ip->i_mount;
--
1.5.6.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/6] XFS: mark inodes for reclaim via a tag in the inode radix tree
2008-10-07 21:54 [PATCH 0/6] XFS: Track reclaimable inodes in inode cache Dave Chinner
2008-10-07 21:54 ` [PATCH 1/6] XFS: move inode reclaim functions to xfs_sync.c Dave Chinner
2008-10-07 21:54 ` [PATCH 2/6] XFS: rename inode reclaim functions Dave Chinner
@ 2008-10-07 21:54 ` Dave Chinner
2008-10-07 21:54 ` [PATCH 4/6] XFS: use the inode radix tree for reclaiming inodes Dave Chinner
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2008-10-07 21:54 UTC (permalink / raw)
To: xfs
Prepare for removing the deleted inode list by marking inodes
for reclaim in the inode radix trees so that we can use the
radix trees to find reclaimable inodes.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/linux-2.6/xfs_sync.c | 41 +++++++++++++++++++++++++++++++++++++++++
fs/xfs/linux-2.6/xfs_sync.h | 4 ++++
fs/xfs/xfs_ag.h | 5 +++++
fs/xfs/xfs_iget.c | 3 +++
fs/xfs/xfs_vnodeops.c | 1 +
5 files changed, 54 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 34413ce..9e7f4dc 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -644,6 +644,47 @@ xfs_reclaim_inode(
return 0;
}
+void
+xfs_inode_set_reclaim_tag(
+ xfs_inode_t *ip)
+{
+ xfs_mount_t *mp = ip->i_mount;
+ xfs_perag_t *pag = xfs_get_perag(mp, ip->i_ino);
+
+ read_lock(&pag->pag_ici_lock);
+ spin_lock(&ip->i_flags_lock);
+ radix_tree_tag_set(&pag->pag_ici_root,
+ XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+ spin_unlock(&ip->i_flags_lock);
+ read_unlock(&pag->pag_ici_lock);
+ xfs_put_perag(mp, pag);
+}
+
+void
+__xfs_inode_clear_reclaim_tag(
+ xfs_mount_t *mp,
+ xfs_perag_t *pag,
+ xfs_inode_t *ip)
+{
+ radix_tree_tag_clear(&pag->pag_ici_root,
+ XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+}
+
+void
+xfs_inode_clear_reclaim_tag(
+ xfs_inode_t *ip)
+{
+ xfs_mount_t *mp = ip->i_mount;
+ xfs_perag_t *pag = xfs_get_perag(mp, ip->i_ino);
+
+ read_lock(&pag->pag_ici_lock);
+ spin_lock(&ip->i_flags_lock);
+ __xfs_inode_clear_reclaim_tag(mp, pag, ip);
+ spin_unlock(&ip->i_flags_lock);
+ read_unlock(&pag->pag_ici_lock);
+ xfs_put_perag(mp, pag);
+}
+
int
xfs_reclaim_inodes(
xfs_mount_t *mp,
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index c1bcd50..5f6de1e 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -48,4 +48,8 @@ void xfs_flush_device(struct xfs_inode *ip);
int xfs_reclaim_inode(struct xfs_inode *ip, int locked, int sync_mode);
int xfs_reclaim_inodes(struct xfs_mount *mp, int noblock, int mode);
+void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
+void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
+void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
+ struct xfs_inode *ip);
#endif
diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h
index 729ee3e..2bfd863 100644
--- a/fs/xfs/xfs_ag.h
+++ b/fs/xfs/xfs_ag.h
@@ -204,6 +204,11 @@ typedef struct xfs_perag
#endif
} xfs_perag_t;
+/*
+ * tags for inode radix tree
+ */
+#define XFS_ICI_RECLAIM_TAG 0 /* inode is to be reclaimed */
+
#define XFS_AG_MAXLEVELS(mp) ((mp)->m_ag_maxlevels)
#define XFS_MIN_FREELIST_RAW(bl,cl,mp) \
(MIN(bl + 1, XFS_AG_MAXLEVELS(mp)) + MIN(cl + 1, XFS_AG_MAXLEVELS(mp)))
diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index c4414e8..a0387f1 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -91,6 +91,9 @@ xfs_iget_cache_hit(
}
xfs_iflags_set(ip, XFS_INEW);
xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
+
+ /* clear the radix tree reclaim flag as well. */
+ __xfs_inode_clear_reclaim_tag(mp, pag, ip);
read_unlock(&pag->pag_ici_lock);
XFS_MOUNT_ILOCK(mp);
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 1addcc7..cbde288 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2820,6 +2820,7 @@ xfs_reclaim(
spin_unlock(&ip->i_flags_lock);
list_add_tail(&ip->i_reclaim, &mp->m_del_inodes);
XFS_MOUNT_IUNLOCK(mp);
+ xfs_inode_set_reclaim_tag(ip);
}
return 0;
}
--
1.5.6.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/6] XFS: use the inode radix tree for reclaiming inodes
2008-10-07 21:54 [PATCH 0/6] XFS: Track reclaimable inodes in inode cache Dave Chinner
` (2 preceding siblings ...)
2008-10-07 21:54 ` [PATCH 3/6] XFS: mark inodes for reclaim via a tag in the inode radix tree Dave Chinner
@ 2008-10-07 21:54 ` Dave Chinner
2008-10-07 21:54 ` [PATCH 5/6] XFS: kill deleted inodes list Dave Chinner
2008-10-07 21:54 ` [PATCH 6/6] XFS: Prevent looping in xfs_sync_inodes_ag Dave Chinner
5 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2008-10-07 21:54 UTC (permalink / raw)
To: xfs
Use the reclaim tag to walk the radix tree and find
the inodes under reclaim. This was the only user of the
deleted inode list.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/linux-2.6/xfs_sync.c | 81 +++++++++++++++++++++++++++++++++++++-----
1 files changed, 71 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 9e7f4dc..bbb40e2 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -685,32 +685,93 @@ xfs_inode_clear_reclaim_tag(
xfs_put_perag(mp, pag);
}
-int
-xfs_reclaim_inodes(
+
+STATIC void
+xfs_reclaim_inodes_ag(
xfs_mount_t *mp,
- int noblock,
+ int ag,
+ int noblock,
int mode)
{
- xfs_inode_t *ip, *n;
+ xfs_inode_t *ip = NULL;
+ xfs_perag_t *pag = &mp->m_perag[ag];
+ int nr_found;
+ int first_index;
+ int skipped;
restart:
- XFS_MOUNT_ILOCK(mp);
- list_for_each_entry_safe(ip, n, &mp->m_del_inodes, i_reclaim) {
+ first_index = 0;
+ skipped = 0;
+ do {
+ /*
+ * use a gang lookup to find the next inode in the tree
+ * as the tree is sparse and a gang lookup walks to find
+ * the number of objects requested.
+ */
+ read_lock(&pag->pag_ici_lock);
+ nr_found = radix_tree_gang_lookup_tag(&pag->pag_ici_root,
+ (void**)&ip, first_index, 1,
+ XFS_ICI_RECLAIM_TAG);
+
+ if (!nr_found) {
+ read_unlock(&pag->pag_ici_lock);
+ break;
+ }
+
+ /* update the index for the next lookup */
+ first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
+
+ ASSERT(xfs_iflags_test(ip, (XFS_IRECLAIMABLE|XFS_IRECLAIM)));
+
+ /* ignore if already under reclaim */
+ if (xfs_iflags_test(ip, XFS_IRECLAIM)) {
+ read_unlock(&pag->pag_ici_lock);
+ continue;
+ }
+
if (noblock) {
- if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0)
+ if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) {
+ read_unlock(&pag->pag_ici_lock);
continue;
+ }
if (xfs_ipincount(ip) ||
!xfs_iflock_nowait(ip)) {
xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ read_unlock(&pag->pag_ici_lock);
continue;
}
}
- XFS_MOUNT_IUNLOCK(mp);
+ read_unlock(&pag->pag_ici_lock);
+
+ /*
+ * hmmm - this is an inode already in reclaim. Do
+ * we even bother catching it here?
+ */
if (xfs_reclaim_inode(ip, noblock, mode))
- delay(1);
+ skipped++;
+ } while (nr_found);
+
+ if (skipped) {
+ delay(1);
goto restart;
}
- XFS_MOUNT_IUNLOCK(mp);
+ return;
+
+}
+
+int
+xfs_reclaim_inodes(
+ xfs_mount_t *mp,
+ int noblock,
+ int mode)
+{
+ int i;
+
+ for (i = 0; i < mp->m_sb.sb_agcount; i++) {
+ if (!mp->m_perag[i].pag_ici_init)
+ continue;
+ xfs_reclaim_inodes_ag(mp, i, noblock, mode);
+ }
return 0;
}
--
1.5.6.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/6] XFS: kill deleted inodes list
2008-10-07 21:54 [PATCH 0/6] XFS: Track reclaimable inodes in inode cache Dave Chinner
` (3 preceding siblings ...)
2008-10-07 21:54 ` [PATCH 4/6] XFS: use the inode radix tree for reclaiming inodes Dave Chinner
@ 2008-10-07 21:54 ` Dave Chinner
2008-10-07 21:54 ` [PATCH 6/6] XFS: Prevent looping in xfs_sync_inodes_ag Dave Chinner
5 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2008-10-07 21:54 UTC (permalink / raw)
To: xfs
Now that the deleted inodes list is unused, kill it. This
also removes the i_reclaim list head from the xfs_inode, shrinking
it by two pointers.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/linux-2.6/xfs_super.c | 2 --
fs/xfs/linux-2.6/xfs_sync.c | 6 ++++++
fs/xfs/xfs_iget.c | 8 --------
fs/xfs/xfs_inode.c | 4 ++--
fs/xfs/xfs_inode.h | 1 -
fs/xfs/xfs_mount.c | 1 -
fs/xfs/xfs_mount.h | 5 +----
fs/xfs/xfs_vnodeops.c | 12 +-----------
8 files changed, 10 insertions(+), 29 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index b893f8c..f4bf7ca 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -907,7 +907,6 @@ xfs_fs_inode_init_once(
atomic_set(&ip->i_iocount, 0);
atomic_set(&ip->i_pincount, 0);
spin_lock_init(&ip->i_flags_lock);
- INIT_LIST_HEAD(&ip->i_reclaim);
init_waitqueue_head(&ip->i_ipin_wait);
/*
* Because we want to use a counting completion, complete
@@ -1549,7 +1548,6 @@ xfs_fs_fill_super(
goto out_free_args;
spin_lock_init(&mp->m_sb_lock);
- mutex_init(&mp->m_ilock);
mutex_init(&mp->m_growlock);
atomic_set(&mp->m_active_trans, 0);
INIT_LIST_HEAD(&mp->m_sync_list);
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index bbb40e2..22006b5 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -644,6 +644,11 @@ xfs_reclaim_inode(
return 0;
}
+/*
+ * We set the inode flag atomically with the radix tree tag.
+ * Once we get tag lookups on the radix tree, this inode flag
+ * can go away.
+ */
void
xfs_inode_set_reclaim_tag(
xfs_inode_t *ip)
@@ -655,6 +660,7 @@ xfs_inode_set_reclaim_tag(
spin_lock(&ip->i_flags_lock);
radix_tree_tag_set(&pag->pag_ici_root,
XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+ __xfs_iflags_set(ip, XFS_IRECLAIMABLE);
spin_unlock(&ip->i_flags_lock);
read_unlock(&pag->pag_ici_lock);
xfs_put_perag(mp, pag);
diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index a0387f1..8001338 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -95,10 +95,6 @@ xfs_iget_cache_hit(
/* clear the radix tree reclaim flag as well. */
__xfs_inode_clear_reclaim_tag(mp, pag, ip);
read_unlock(&pag->pag_ici_lock);
-
- XFS_MOUNT_ILOCK(mp);
- list_del_init(&ip->i_reclaim);
- XFS_MOUNT_IUNLOCK(mp);
} else if (!igrab(VFS_I(ip))) {
/* If the VFS inode is being torn down, pause and try again. */
error = EAGAIN;
@@ -419,11 +415,7 @@ xfs_iextract(
write_unlock(&pag->pag_ici_lock);
xfs_put_perag(mp, pag);
- /* Deal with the deleted inodes list */
- XFS_MOUNT_ILOCK(mp);
- list_del_init(&ip->i_reclaim);
mp->m_ireclaims++;
- XFS_MOUNT_IUNLOCK(mp);
}
/*
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 307a63f..456941e 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -810,7 +810,7 @@ xfs_inode_alloc(
ASSERT(atomic_read(&ip->i_iocount) == 0);
ASSERT(atomic_read(&ip->i_pincount) == 0);
ASSERT(!spin_is_locked(&ip->i_flags_lock));
- ASSERT(list_empty(&ip->i_reclaim));
+ ASSERT(completion_done(&ip->i_flush));
/*
* initialise the VFS inode here to get failures
@@ -2732,7 +2732,7 @@ xfs_idestroy(
ASSERT(atomic_read(&ip->i_iocount) == 0);
ASSERT(atomic_read(&ip->i_pincount) == 0);
ASSERT(!spin_is_locked(&ip->i_flags_lock));
- ASSERT(list_empty(&ip->i_reclaim));
+ ASSERT(completion_done(&ip->i_flush));
kmem_zone_free(xfs_inode_zone, ip);
}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index f946b13..f02535c 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -236,7 +236,6 @@ typedef struct dm_attrs_s {
typedef struct xfs_inode {
/* Inode linking and identification information. */
struct xfs_mount *i_mount; /* fs mount struct ptr */
- struct list_head i_reclaim; /* reclaim list */
struct xfs_dquot *i_udquot; /* user dquot */
struct xfs_dquot *i_gdquot; /* group dquot */
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index a230d8f..794a4e2 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -582,7 +582,6 @@ xfs_mount_common(xfs_mount_t *mp, xfs_sb_t *sbp)
mp->m_blockmask = sbp->sb_blocksize - 1;
mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG;
mp->m_blockwmask = mp->m_blockwsize - 1;
- INIT_LIST_HEAD(&mp->m_del_inodes);
/*
* Setup for attributes, in case they get created.
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 7e02a86..fc70c69 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -248,8 +248,6 @@ typedef struct xfs_mount {
xfs_agnumber_t m_agirotor; /* last ag dir inode alloced */
spinlock_t m_agirotor_lock;/* .. and lock protecting it */
xfs_agnumber_t m_maxagi; /* highest inode alloc group */
- struct list_head m_del_inodes; /* inodes to reclaim */
- mutex_t m_ilock; /* inode list mutex */
uint m_ireclaims; /* count of calls to reclaim*/
uint m_readio_log; /* min read size log bytes */
uint m_readio_blocks; /* min read size blocks */
@@ -312,8 +310,7 @@ typedef struct xfs_mount {
int m_attr_magicpct;/* 37% of the blocksize */
int m_dir_magicpct; /* 37% of the dir blocksize */
__uint8_t m_mk_sharedro; /* mark shared ro on unmount */
- __uint8_t m_inode_quiesce;/* call quiesce on new inodes.
- field governed by m_ilock */
+ __uint8_t m_inode_quiesce;/* call quiesce on new inodes. */
__uint8_t m_sectbb_log; /* sectlog - BBSHIFT */
const struct xfs_nameops *m_dirnameops; /* vector of dir name ops */
int m_dirblksize; /* directory block sz--bytes */
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index cbde288..bfb25f7 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2810,18 +2810,8 @@ xfs_reclaim(
xfs_iflock(ip);
xfs_iflags_set(ip, XFS_IRECLAIMABLE);
return xfs_reclaim_inode(ip, 1, XFS_IFLUSH_DELWRI_ELSE_SYNC);
- } else {
- xfs_mount_t *mp = ip->i_mount;
-
- /* Protect sync and unpin from us */
- XFS_MOUNT_ILOCK(mp);
- spin_lock(&ip->i_flags_lock);
- __xfs_iflags_set(ip, XFS_IRECLAIMABLE);
- spin_unlock(&ip->i_flags_lock);
- list_add_tail(&ip->i_reclaim, &mp->m_del_inodes);
- XFS_MOUNT_IUNLOCK(mp);
- xfs_inode_set_reclaim_tag(ip);
}
+ xfs_inode_set_reclaim_tag(ip);
return 0;
}
--
1.5.6.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 6/6] XFS: Prevent looping in xfs_sync_inodes_ag
2008-10-07 21:54 [PATCH 0/6] XFS: Track reclaimable inodes in inode cache Dave Chinner
` (4 preceding siblings ...)
2008-10-07 21:54 ` [PATCH 5/6] XFS: kill deleted inodes list Dave Chinner
@ 2008-10-07 21:54 ` Dave Chinner
2008-10-08 18:21 ` Christoph Hellwig
5 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2008-10-07 21:54 UTC (permalink / raw)
To: xfs
If the last block of the AG has inodes in it and the AG is an
exactly power-of-2 size then the last inode in the AG points
to the last block in the AG. If we try to find the next inode
in the AG by adding one to the inode number, we increment the
inode number past the size of the AG. The result is that the
macro XFS_INO_TO_AGINO() will strip the AG portion of the inode
number and return an inode number of zero.
That is, instead of terminating the lookup loop because we hit the
inode number went outside the valid range for the AG, the search
index returns to zero and we start traversing the radix tree from
the start again. This results in an endless loop in
xfs_sync_inodes_ag().
Fix it be detecting if the new search index decreases as a result of
incrementing the current inode number. That indicate an overflow and
hence that we have finished processing the AG so we can terminate
the loop.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/linux-2.6/xfs_sync.c | 26 ++++++++++++++++++++++----
1 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 22006b5..ee1648b 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -59,7 +59,7 @@ xfs_sync_inodes_ag(
{
xfs_perag_t *pag = &mp->m_perag[ag];
int nr_found;
- int first_index = 0;
+ uint32_t first_index = 0;
int error = 0;
int last_error = 0;
int fflag = XFS_B_ASYNC;
@@ -97,8 +97,17 @@ xfs_sync_inodes_ag(
break;
}
- /* update the index for the next lookup */
+ /*
+ * Update the index for the next lookup. Catch overflows
+ * into the next AG range which can occur if we have inodes
+ * in the last block of the AG and we are currently
+ * pointing to the last inode.
+ */
first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
+ if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino)) {
+ read_unlock(&pag->pag_ici_lock);
+ break;
+ }
/*
* skip inodes in reclaim. Let xfs_syncsub do that for
@@ -702,7 +711,7 @@ xfs_reclaim_inodes_ag(
xfs_inode_t *ip = NULL;
xfs_perag_t *pag = &mp->m_perag[ag];
int nr_found;
- int first_index;
+ uint32_t first_index;
int skipped;
restart:
@@ -724,8 +733,17 @@ restart:
break;
}
- /* update the index for the next lookup */
+ /*
+ * Update the index for the next lookup. Catch overflows
+ * into the next AG range which can occur if we have inodes
+ * in the last block of the AG and we are currently
+ * pointing to the last inode.
+ */
first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
+ if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino)) {
+ read_unlock(&pag->pag_ici_lock);
+ break;
+ }
ASSERT(xfs_iflags_test(ip, (XFS_IRECLAIMABLE|XFS_IRECLAIM)));
--
1.5.6.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 6/6] XFS: Prevent looping in xfs_sync_inodes_ag
2008-10-07 21:54 ` [PATCH 6/6] XFS: Prevent looping in xfs_sync_inodes_ag Dave Chinner
@ 2008-10-08 18:21 ` Christoph Hellwig
2008-10-09 0:02 ` Dave Chinner
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2008-10-08 18:21 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, Oct 08, 2008 at 08:54:40AM +1100, Dave Chinner wrote:
> If the last block of the AG has inodes in it and the AG is an
> exactly power-of-2 size then the last inode in the AG points
> to the last block in the AG. If we try to find the next inode
> in the AG by adding one to the inode number, we increment the
> inode number past the size of the AG. The result is that the
> macro XFS_INO_TO_AGINO() will strip the AG portion of the inode
> number and return an inode number of zero.
>
> That is, instead of terminating the lookup loop because we hit the
> inode number went outside the valid range for the AG, the search
> index returns to zero and we start traversing the radix tree from
> the start again. This results in an endless loop in
> xfs_sync_inodes_ag().
>
> Fix it be detecting if the new search index decreases as a result of
> incrementing the current inode number. That indicate an overflow and
> hence that we have finished processing the AG so we can terminate
> the loop.
Shouldn't this get merged into the patch that introduces the radix-tree
based sync?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 6/6] XFS: Prevent looping in xfs_sync_inodes_ag
2008-10-08 18:21 ` Christoph Hellwig
@ 2008-10-09 0:02 ` Dave Chinner
0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2008-10-09 0:02 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Wed, Oct 08, 2008 at 02:21:38PM -0400, Christoph Hellwig wrote:
> On Wed, Oct 08, 2008 at 08:54:40AM +1100, Dave Chinner wrote:
> > If the last block of the AG has inodes in it and the AG is an
> > exactly power-of-2 size then the last inode in the AG points
> > to the last block in the AG. If we try to find the next inode
> > in the AG by adding one to the inode number, we increment the
> > inode number past the size of the AG. The result is that the
> > macro XFS_INO_TO_AGINO() will strip the AG portion of the inode
> > number and return an inode number of zero.
> >
> > That is, instead of terminating the lookup loop because we hit the
> > inode number went outside the valid range for the AG, the search
> > index returns to zero and we start traversing the radix tree from
> > the start again. This results in an endless loop in
> > xfs_sync_inodes_ag().
> >
> > Fix it be detecting if the new search index decreases as a result of
> > incrementing the current inode number. That indicate an overflow and
> > hence that we have finished processing the AG so we can terminate
> > the loop.
>
> Shouldn't this get merged into the patch that introduces the radix-tree
> based sync?
It could be, but I didn't want to have to redo that reviewed patch
series. I've only hit this problem once in the past 6 weeks so I
thought adding a new patch rather than modifying patches to add the
fix was a better way to go....
I can integrate it if you want....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-10-09 0:00 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-07 21:54 [PATCH 0/6] XFS: Track reclaimable inodes in inode cache Dave Chinner
2008-10-07 21:54 ` [PATCH 1/6] XFS: move inode reclaim functions to xfs_sync.c Dave Chinner
2008-10-07 21:54 ` [PATCH 2/6] XFS: rename inode reclaim functions Dave Chinner
2008-10-07 21:54 ` [PATCH 3/6] XFS: mark inodes for reclaim via a tag in the inode radix tree Dave Chinner
2008-10-07 21:54 ` [PATCH 4/6] XFS: use the inode radix tree for reclaiming inodes Dave Chinner
2008-10-07 21:54 ` [PATCH 5/6] XFS: kill deleted inodes list Dave Chinner
2008-10-07 21:54 ` [PATCH 6/6] XFS: Prevent looping in xfs_sync_inodes_ag Dave Chinner
2008-10-08 18:21 ` Christoph Hellwig
2008-10-09 0:02 ` Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2008-09-13 14:14 [PATCH 0/6] XFS: Track reclaimable inodes in inode cache Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox