* [PATCH 0/5] Combine the XFS and Linux inode structures V2
@ 2008-10-07 21:52 Dave Chinner
2008-10-07 21:52 ` [PATCH 1/5] XFS: factor xfs_iget_core() into hit and miss cases Dave Chinner
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 21:52 UTC (permalink / raw)
To: xfs; +Cc: linux-fsdevel
XFS currently has to deal with two separate inode lifecycles
which makes for complexity in inode lookups and reclaim. We
also have the problem of not always having a linux inode around
when it might be useful to have it.
To avoid these lifecycle problems, this series embedѕ the linux
inode inside the struct xfs_inode and changes the way we reference
the two inodes. We can no longer check for a null linux inode -
instead we have to check to see if it is valid or not by checking
either the linux inode or xfs inode state flags. While this means
that inodes waiting for reclaim use more memory, this is not the
common state for inodes and they will soon be completely freed so
the additional memeory use in this state is only temporary.
This combining of the inodes simplifies the inode and reclaim logic,
making it possible to do reclaim via radix tree tags (an upcoming
patch series) and to be able to use RCU locking on the radix trees.
The fact that we don't have a simple mechanism to determine the
reclaim state of the inode makes RCU locking very complex, and this
complexity is removed by having a combined inode structure.
This patch series also changes the way XFS caches inodes. It no
longer uses the linux inode cache as the primary lookup cache -
instead we rely solely on the XFS inode caches. This avoids the
inode_lock in lookups that hit the cache - we should get much
better parallelism out of inode lookup than we currently do now.
In future, we should also be able to cull duplicate fields out of
the xfs and linux inodes reducing the overall memory usage of
the active inode cache. This provides scope for continuing to
reduce the memory footprint of the XFS inode cache.
Version 2:
o clean up based on review comments
o pull preparation patches into series now there's only two
uncommitted patches.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/5] XFS: factor xfs_iget_core() into hit and miss cases
2008-10-07 21:52 [PATCH 0/5] Combine the XFS and Linux inode structures V2 Dave Chinner
@ 2008-10-07 21:52 ` Dave Chinner
2008-10-07 21:52 ` [PATCH 2/5] XFS: Never call mark_inode_dirty_sync() directly Dave Chinner
` (4 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 21:52 UTC (permalink / raw)
To: xfs; +Cc: linux-fsdevel
There are really two cases in xfs_iget_core(). The first is the
cache hit case, the second is the miss case. They share very little
code, and hence can easily be factored out into separate functions.
This makes the code much easier to understand and subsequently
modify.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/xfs_iget.c | 348 +++++++++++++++++++++++++++++------------------------
1 files changed, 191 insertions(+), 157 deletions(-)
diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index 58865fe..b2539b1 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -40,161 +40,119 @@
#include "xfs_utils.h"
/*
- * Look up an inode by number in the given file system.
- * The inode is looked up in the cache held in each AG.
- * If the inode is found in the cache, attach it to the provided
- * vnode.
- *
- * If it is not in core, read it in from the file system's device,
- * add it to the cache and attach the provided vnode.
- *
- * The inode is locked according to the value of the lock_flags parameter.
- * This flag parameter indicates how and if the inode's IO lock and inode lock
- * should be taken.
- *
- * mp -- the mount point structure for the current file system. It points
- * to the inode hash table.
- * tp -- a pointer to the current transaction if there is one. This is
- * simply passed through to the xfs_iread() call.
- * ino -- the number of the inode desired. This is the unique identifier
- * within the file system for the inode being requested.
- * lock_flags -- flags indicating how to lock the inode. See the comment
- * for xfs_ilock() for a list of valid values.
- * bno -- the block number starting the buffer containing the inode,
- * if known (as by bulkstat), else 0.
+ * Check the validity of the inode we just found it the cache
*/
-STATIC int
-xfs_iget_core(
- struct inode *inode,
- xfs_mount_t *mp,
- xfs_trans_t *tp,
- xfs_ino_t ino,
- uint flags,
- uint lock_flags,
- xfs_inode_t **ipp,
- xfs_daddr_t bno)
+static int
+xfs_iget_cache_hit(
+ struct inode *inode,
+ struct xfs_perag *pag,
+ struct xfs_inode *ip,
+ int flags,
+ int lock_flags) __releases(pag->pag_ici_lock)
{
- struct inode *old_inode;
- xfs_inode_t *ip;
- int error;
- unsigned long first_index, mask;
- xfs_perag_t *pag;
- xfs_agino_t agino;
+ struct xfs_mount *mp = ip->i_mount;
+ struct inode *old_inode;
+ int error = 0;
- /* the radix tree exists only in inode capable AGs */
- if (XFS_INO_TO_AGNO(mp, ino) >= mp->m_maxagi)
- return EINVAL;
-
- /* get the perag structure and ensure that it's inode capable */
- pag = xfs_get_perag(mp, ino);
- if (!pag->pagi_inodeok)
- return EINVAL;
- ASSERT(pag->pag_ici_init);
- agino = XFS_INO_TO_AGINO(mp, ino);
-
-again:
- read_lock(&pag->pag_ici_lock);
- ip = radix_tree_lookup(&pag->pag_ici_root, agino);
+ /*
+ * If INEW is set this inode is being set up
+ * Pause and try again.
+ */
+ if (xfs_iflags_test(ip, XFS_INEW)) {
+ error = EAGAIN;
+ XFS_STATS_INC(xs_ig_frecycle);
+ goto out_error;
+ }
- if (ip != NULL) {
+ old_inode = ip->i_vnode;
+ if (old_inode == NULL) {
/*
- * If INEW is set this inode is being set up
+ * If IRECLAIM is set this inode is
+ * on its way out of the system,
* we need to pause and try again.
*/
- if (xfs_iflags_test(ip, XFS_INEW)) {
- read_unlock(&pag->pag_ici_lock);
- delay(1);
+ if (xfs_iflags_test(ip, XFS_IRECLAIM)) {
+ error = EAGAIN;
XFS_STATS_INC(xs_ig_frecycle);
+ goto out_error;
+ }
+ ASSERT(xfs_iflags_test(ip, XFS_IRECLAIMABLE));
- goto again;
+ /*
+ * If lookup is racing with unlink, then we
+ * should return an error immediately so we
+ * don't remove it from the reclaim list and
+ * potentially leak the inode.
+ */
+ if ((ip->i_d.di_mode == 0) &&
+ !(flags & XFS_IGET_CREATE)) {
+ error = ENOENT;
+ goto out_error;
}
+ xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
- old_inode = ip->i_vnode;
- if (old_inode == NULL) {
- /*
- * If IRECLAIM is set this inode is
- * on its way out of the system,
- * we need to pause and try again.
- */
- if (xfs_iflags_test(ip, XFS_IRECLAIM)) {
- read_unlock(&pag->pag_ici_lock);
- delay(1);
- XFS_STATS_INC(xs_ig_frecycle);
-
- goto again;
- }
- ASSERT(xfs_iflags_test(ip, XFS_IRECLAIMABLE));
-
- /*
- * If lookup is racing with unlink, then we
- * should return an error immediately so we
- * don't remove it from the reclaim list and
- * potentially leak the inode.
- */
- if ((ip->i_d.di_mode == 0) &&
- !(flags & XFS_IGET_CREATE)) {
- read_unlock(&pag->pag_ici_lock);
- xfs_put_perag(mp, pag);
- return ENOENT;
- }
-
- xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
-
- XFS_STATS_INC(xs_ig_found);
- xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
- read_unlock(&pag->pag_ici_lock);
-
- XFS_MOUNT_ILOCK(mp);
- list_del_init(&ip->i_reclaim);
- XFS_MOUNT_IUNLOCK(mp);
-
- goto finish_inode;
-
- } else if (inode != old_inode) {
- /* The inode is being torn down, pause and
- * try again.
- */
- if (old_inode->i_state & (I_FREEING | I_CLEAR)) {
- read_unlock(&pag->pag_ici_lock);
- delay(1);
- XFS_STATS_INC(xs_ig_frecycle);
-
- goto again;
- }
+ xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
+ read_unlock(&pag->pag_ici_lock);
+
+ XFS_MOUNT_ILOCK(mp);
+ list_del_init(&ip->i_reclaim);
+ XFS_MOUNT_IUNLOCK(mp);
+
+ } else if (inode != old_inode) {
+ /* The inode is being torn down, pause and
+ * try again.
+ */
+ if (old_inode->i_state & (I_FREEING | I_CLEAR)) {
+ error = EAGAIN;
+ XFS_STATS_INC(xs_ig_frecycle);
+ goto out_error;
+ }
/* Chances are the other vnode (the one in the inode) is being torn
* down right now, and we landed on top of it. Question is, what do
* we do? Unhook the old inode and hook up the new one?
*/
- cmn_err(CE_PANIC,
- "xfs_iget_core: ambiguous vns: vp/0x%p, invp/0x%p",
- old_inode, inode);
- }
-
- /*
- * Inode cache hit
- */
+ cmn_err(CE_PANIC,
+ "xfs_iget_core: ambiguous vns: vp/0x%p, invp/0x%p",
+ old_inode, inode);
+ } else {
read_unlock(&pag->pag_ici_lock);
- XFS_STATS_INC(xs_ig_found);
+ }
-finish_inode:
- if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
- xfs_put_perag(mp, pag);
- return ENOENT;
- }
+ if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
+ error = ENOENT;
+ goto out;
+ }
- if (lock_flags != 0)
- xfs_ilock(ip, lock_flags);
+ if (lock_flags != 0)
+ xfs_ilock(ip, lock_flags);
- xfs_iflags_clear(ip, XFS_ISTALE);
- xfs_itrace_exit_tag(ip, "xfs_iget.found");
- goto return_ip;
- }
+ xfs_iflags_clear(ip, XFS_ISTALE);
+ xfs_itrace_exit_tag(ip, "xfs_iget.found");
+ XFS_STATS_INC(xs_ig_found);
+ return 0;
- /*
- * Inode cache miss
- */
+out_error:
read_unlock(&pag->pag_ici_lock);
- XFS_STATS_INC(xs_ig_missed);
+out:
+ return error;
+}
+
+
+static int
+xfs_iget_cache_miss(
+ struct xfs_mount *mp,
+ struct xfs_perag *pag,
+ xfs_trans_t *tp,
+ xfs_ino_t ino,
+ struct xfs_inode **ipp,
+ xfs_daddr_t bno,
+ int flags,
+ int lock_flags) __releases(pag->pag_ici_lock)
+{
+ struct xfs_inode *ip;
+ int error;
+ unsigned long first_index, mask;
+ xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ino);
/*
* Read the disk inode attributes into a new inode structure and get
@@ -202,17 +160,14 @@ finish_inode:
*/
error = xfs_iread(mp, tp, ino, &ip, bno,
(flags & XFS_IGET_BULKSTAT) ? XFS_IMAP_BULKSTAT : 0);
- if (error) {
- xfs_put_perag(mp, pag);
+ if (error)
return error;
- }
xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
- xfs_idestroy(ip);
- xfs_put_perag(mp, pag);
- return ENOENT;
+ error = ENOENT;
+ goto out_destroy;
}
/*
@@ -220,9 +175,8 @@ finish_inode:
* write spinlock.
*/
if (radix_tree_preload(GFP_KERNEL)) {
- xfs_idestroy(ip);
- delay(1);
- goto again;
+ error = EAGAIN;
+ goto out_destroy;
}
if (lock_flags)
@@ -231,32 +185,104 @@ finish_inode:
mask = ~(((XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog)) - 1);
first_index = agino & mask;
write_lock(&pag->pag_ici_lock);
- /*
- * insert the new inode
- */
+
+ /* insert the new inode */
error = radix_tree_insert(&pag->pag_ici_root, agino, ip);
if (unlikely(error)) {
- BUG_ON(error != -EEXIST);
- write_unlock(&pag->pag_ici_lock);
- radix_tree_preload_end();
- if (lock_flags)
- xfs_iunlock(ip, lock_flags);
- xfs_idestroy(ip);
+ WARN_ON(error != -EEXIST);
XFS_STATS_INC(xs_ig_dup);
- goto again;
+ error = EAGAIN;
+ goto out_unlock;
}
- /*
- * These values _must_ be set before releasing the radix tree lock!
- */
+ /* These values _must_ be set before releasing the radix tree lock! */
ip->i_udquot = ip->i_gdquot = NULL;
xfs_iflags_set(ip, XFS_INEW);
write_unlock(&pag->pag_ici_lock);
radix_tree_preload_end();
+ *ipp = ip;
+ return 0;
+
+out_unlock:
+ write_unlock(&pag->pag_ici_lock);
+ radix_tree_preload_end();
+out_destroy:
+ xfs_idestroy(ip);
+ return error;
+}
+
+/*
+ * Look up an inode by number in the given file system.
+ * The inode is looked up in the cache held in each AG.
+ * If the inode is found in the cache, attach it to the provided
+ * vnode.
+ *
+ * If it is not in core, read it in from the file system's device,
+ * add it to the cache and attach the provided vnode.
+ *
+ * The inode is locked according to the value of the lock_flags parameter.
+ * This flag parameter indicates how and if the inode's IO lock and inode lock
+ * should be taken.
+ *
+ * mp -- the mount point structure for the current file system. It points
+ * to the inode hash table.
+ * tp -- a pointer to the current transaction if there is one. This is
+ * simply passed through to the xfs_iread() call.
+ * ino -- the number of the inode desired. This is the unique identifier
+ * within the file system for the inode being requested.
+ * lock_flags -- flags indicating how to lock the inode. See the comment
+ * for xfs_ilock() for a list of valid values.
+ * bno -- the block number starting the buffer containing the inode,
+ * if known (as by bulkstat), else 0.
+ */
+STATIC int
+xfs_iget_core(
+ struct inode *inode,
+ xfs_mount_t *mp,
+ xfs_trans_t *tp,
+ xfs_ino_t ino,
+ uint flags,
+ uint lock_flags,
+ xfs_inode_t **ipp,
+ xfs_daddr_t bno)
+{
+ xfs_inode_t *ip;
+ int error;
+ xfs_perag_t *pag;
+ xfs_agino_t agino;
+
+ /* the radix tree exists only in inode capable AGs */
+ if (XFS_INO_TO_AGNO(mp, ino) >= mp->m_maxagi)
+ return EINVAL;
+
+ /* get the perag structure and ensure that it's inode capable */
+ pag = xfs_get_perag(mp, ino);
+ if (!pag->pagi_inodeok)
+ return EINVAL;
+ ASSERT(pag->pag_ici_init);
+ agino = XFS_INO_TO_AGINO(mp, ino);
+
+again:
+ error = 0;
+ read_lock(&pag->pag_ici_lock);
+ ip = radix_tree_lookup(&pag->pag_ici_root, agino);
+
+ if (ip) {
+ error = xfs_iget_cache_hit(inode, pag, ip, flags, lock_flags);
+ if (error)
+ goto out_error_or_again;
+ } else {
+ read_unlock(&pag->pag_ici_lock);
+ XFS_STATS_INC(xs_ig_missed);
+
+ error = xfs_iget_cache_miss(mp, pag, tp, ino, &ip, bno,
+ flags, lock_flags);
+ if (error)
+ goto out_error_or_again;
+ }
xfs_put_perag(mp, pag);
- return_ip:
ASSERT(ip->i_df.if_ext_max ==
XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t));
@@ -276,6 +302,14 @@ finish_inode:
if (ip->i_d.di_mode != 0)
xfs_setup_inode(ip);
return 0;
+
+out_error_or_again:
+ if (error == EAGAIN) {
+ delay(1);
+ goto again;
+ }
+ xfs_put_perag(mp, pag);
+ return error;
}
--
1.5.6.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/5] XFS: Never call mark_inode_dirty_sync() directly
2008-10-07 21:52 [PATCH 0/5] Combine the XFS and Linux inode structures V2 Dave Chinner
2008-10-07 21:52 ` [PATCH 1/5] XFS: factor xfs_iget_core() into hit and miss cases Dave Chinner
@ 2008-10-07 21:52 ` Dave Chinner
2008-10-07 21:52 ` [PATCH 3/5] Inode: Allow external initialisers Dave Chinner
` (3 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 21:52 UTC (permalink / raw)
To: xfs; +Cc: linux-fsdevel
Once the Linux inode and the XFS inode are combined, we cannot rely
on just check if the linux inode exists as a method of determining
if it is valid or not. Hence we should always call
xfs_mark_inode_dirty_sync() instead as it does the correct checks to
determine if the liinux inode is in a valid state or not.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/linux-2.6/xfs_aops.c | 2 +-
fs/xfs/linux-2.6/xfs_iops.c | 2 +-
fs/xfs/linux-2.6/xfs_super.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index a44d68e..8fbc97d 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -191,7 +191,7 @@ xfs_setfilesize(
ip->i_d.di_size = isize;
ip->i_update_core = 1;
ip->i_update_size = 1;
- mark_inode_dirty_sync(ioend->io_inode);
+ xfs_mark_inode_dirty_sync(ip);
}
xfs_iunlock(ip, XFS_ILOCK_EXCL);
diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
index 095d271..3bfb3c0 100644
--- a/fs/xfs/linux-2.6/xfs_iops.c
+++ b/fs/xfs/linux-2.6/xfs_iops.c
@@ -128,7 +128,7 @@ xfs_ichgtime(
if (sync_it) {
SYNCHRONIZE();
ip->i_update_core = 1;
- mark_inode_dirty_sync(inode);
+ xfs_mark_inode_dirty_sync(ip);
}
}
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 1f9c5a9..7474859 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -948,7 +948,7 @@ xfs_fs_write_inode(
* it dirty again so we'll try again later.
*/
if (error)
- mark_inode_dirty_sync(inode);
+ xfs_mark_inode_dirty_sync(XFS_I(inode));
return -error;
}
--
1.5.6.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/5] Inode: Allow external initialisers
2008-10-07 21:52 [PATCH 0/5] Combine the XFS and Linux inode structures V2 Dave Chinner
2008-10-07 21:52 ` [PATCH 1/5] XFS: factor xfs_iget_core() into hit and miss cases Dave Chinner
2008-10-07 21:52 ` [PATCH 2/5] XFS: Never call mark_inode_dirty_sync() directly Dave Chinner
@ 2008-10-07 21:52 ` Dave Chinner
2008-10-14 7:00 ` Lachlan McIlroy
2008-10-07 21:52 ` [PATCH 4/5] Inode: Allow external list initialisation Dave Chinner
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 21:52 UTC (permalink / raw)
To: xfs; +Cc: linux-fsdevel
To allow XFS to combine the XFS and linux inodes into a single
structure, we need to drive inode lookup from the XFS inode cache,
not the generic inode cache. This means that we need initialise a
struct inode from a context outside alloc_inode() as it is no longer
used by XFS.
Factor and export the struct inode initialisation code from
alloc_inode() to inode_init_always() as a counterpart to
inode_init_once(). i.e. we have to call this init function for each
inode instantiation (always), as opposed inode_init_once() which is
only called on slab object instantiation (once).
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/inode.c | 152 +++++++++++++++++++++++++++++-----------------------
include/linux/fs.h | 1 +
2 files changed, 85 insertions(+), 68 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 0487ddb..e7ee999 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -108,84 +108,100 @@ static void wake_up_inode(struct inode *inode)
wake_up_bit(&inode->i_state, __I_LOCK);
}
-static struct inode *alloc_inode(struct super_block *sb)
+/**
+ * inode_init_always - perform inode structure intialisation
+ * @sb - superblock inode belongs to.
+ * @inode - inode to initialise
+ *
+ * These are initializations that need to be done on every inode
+ * allocation as the fields are not initialised by slab allocation.
+ */
+struct inode *inode_init_always(struct super_block *sb, struct inode *inode)
{
static const struct address_space_operations empty_aops;
static struct inode_operations empty_iops;
static const struct file_operations empty_fops;
- struct inode *inode;
-
- if (sb->s_op->alloc_inode)
- inode = sb->s_op->alloc_inode(sb);
- else
- inode = (struct inode *) kmem_cache_alloc(inode_cachep, GFP_KERNEL);
- if (inode) {
- struct address_space * const mapping = &inode->i_data;
-
- inode->i_sb = sb;
- inode->i_blkbits = sb->s_blocksize_bits;
- inode->i_flags = 0;
- atomic_set(&inode->i_count, 1);
- inode->i_op = &empty_iops;
- inode->i_fop = &empty_fops;
- inode->i_nlink = 1;
- atomic_set(&inode->i_writecount, 0);
- inode->i_size = 0;
- inode->i_blocks = 0;
- inode->i_bytes = 0;
- inode->i_generation = 0;
+ struct address_space * const mapping = &inode->i_data;
+
+ inode->i_sb = sb;
+ inode->i_blkbits = sb->s_blocksize_bits;
+ inode->i_flags = 0;
+ atomic_set(&inode->i_count, 1);
+ inode->i_op = &empty_iops;
+ inode->i_fop = &empty_fops;
+ inode->i_nlink = 1;
+ atomic_set(&inode->i_writecount, 0);
+ inode->i_size = 0;
+ inode->i_blocks = 0;
+ inode->i_bytes = 0;
+ inode->i_generation = 0;
#ifdef CONFIG_QUOTA
- memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
+ memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
#endif
- inode->i_pipe = NULL;
- inode->i_bdev = NULL;
- inode->i_cdev = NULL;
- inode->i_rdev = 0;
- inode->dirtied_when = 0;
- if (security_inode_alloc(inode)) {
- if (inode->i_sb->s_op->destroy_inode)
- inode->i_sb->s_op->destroy_inode(inode);
- else
- kmem_cache_free(inode_cachep, (inode));
- return NULL;
- }
+ inode->i_pipe = NULL;
+ inode->i_bdev = NULL;
+ inode->i_cdev = NULL;
+ inode->i_rdev = 0;
+ inode->dirtied_when = 0;
+ if (security_inode_alloc(inode)) {
+ if (inode->i_sb->s_op->destroy_inode)
+ inode->i_sb->s_op->destroy_inode(inode);
+ else
+ kmem_cache_free(inode_cachep, (inode));
+ return NULL;
+ }
- spin_lock_init(&inode->i_lock);
- lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key);
+ spin_lock_init(&inode->i_lock);
+ lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key);
- mutex_init(&inode->i_mutex);
- lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key);
+ mutex_init(&inode->i_mutex);
+ lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key);
- init_rwsem(&inode->i_alloc_sem);
- lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key);
+ init_rwsem(&inode->i_alloc_sem);
+ lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key);
- mapping->a_ops = &empty_aops;
- mapping->host = inode;
- mapping->flags = 0;
- mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
- mapping->assoc_mapping = NULL;
- mapping->backing_dev_info = &default_backing_dev_info;
- mapping->writeback_index = 0;
+ mapping->a_ops = &empty_aops;
+ mapping->host = inode;
+ mapping->flags = 0;
+ mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
+ mapping->assoc_mapping = NULL;
+ mapping->backing_dev_info = &default_backing_dev_info;
+ mapping->writeback_index = 0;
- /*
- * If the block_device provides a backing_dev_info for client
- * inodes then use that. Otherwise the inode share the bdev's
- * backing_dev_info.
- */
- if (sb->s_bdev) {
- struct backing_dev_info *bdi;
+ /*
+ * If the block_device provides a backing_dev_info for client
+ * inodes then use that. Otherwise the inode share the bdev's
+ * backing_dev_info.
+ */
+ if (sb->s_bdev) {
+ struct backing_dev_info *bdi;
- bdi = sb->s_bdev->bd_inode_backing_dev_info;
- if (!bdi)
- bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
- mapping->backing_dev_info = bdi;
- }
- inode->i_private = NULL;
- inode->i_mapping = mapping;
+ bdi = sb->s_bdev->bd_inode_backing_dev_info;
+ if (!bdi)
+ bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
+ mapping->backing_dev_info = bdi;
}
+ inode->i_private = NULL;
+ inode->i_mapping = mapping;
+
return inode;
}
+EXPORT_SYMBOL(inode_init_always);
+
+static struct inode *alloc_inode(struct super_block *sb)
+{
+ struct inode *inode;
+
+ if (sb->s_op->alloc_inode)
+ inode = sb->s_op->alloc_inode(sb);
+ else
+ inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
+
+ if (inode)
+ return inode_init_always(sb, inode);
+ return NULL;
+}
void destroy_inode(struct inode *inode)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 580b513..ce55983 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1811,6 +1811,7 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int origin);
extern loff_t vfs_llseek(struct file *file, loff_t offset, int origin);
+extern struct inode * inode_init_always(struct super_block *, struct inode *);
extern void inode_init_once(struct inode *);
extern void iput(struct inode *);
extern struct inode * igrab(struct inode *);
--
1.5.6.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/5] Inode: Allow external list initialisation
2008-10-07 21:52 [PATCH 0/5] Combine the XFS and Linux inode structures V2 Dave Chinner
` (2 preceding siblings ...)
2008-10-07 21:52 ` [PATCH 3/5] Inode: Allow external initialisers Dave Chinner
@ 2008-10-07 21:52 ` Dave Chinner
2008-10-07 21:52 ` [PATCH 5/5] XFS: Combine the XFS and Linux inodes V3 Dave Chinner
2008-10-09 4:21 ` [PATCH 6/5]: XFS: Prevent use-after-free caused by synchronous inode reclaim Dave Chinner
5 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 21:52 UTC (permalink / raw)
To: xfs; +Cc: linux-fsdevel
To allow XFS to combine the XFS and linux inodes into a single
structure, we need to drive inode lookup from the XFS inode cache,
not the generic inode cache. This means that we need initialise a
struct inode from a context outside alloc_inode() as it is no longer
used by XFS.
After inode allocation and initialisation, we need to add the inode
to the superblock list, the in-use list, hash it and do some
accounting. This all needs to be done with the inode_lock held and
there are already several places in fs/inode.c that do this list
manipulation. Factor out the common code, add a locking wrapper and
export the function so ti can be called from XFS.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/inode.c | 67 +++++++++++++++++++++++++++++++++++----------------
include/linux/fs.h | 1 +
2 files changed, 47 insertions(+), 21 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index e7ee999..fbcf6c5 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -550,6 +550,49 @@ repeat:
return node ? inode : NULL;
}
+static unsigned long hash(struct super_block *sb, unsigned long hashval)
+{
+ unsigned long tmp;
+
+ tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) /
+ L1_CACHE_BYTES;
+ tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
+ return tmp & I_HASHMASK;
+}
+
+static inline void
+__inode_add_to_lists(struct super_block *sb, struct hlist_head *head,
+ struct inode *inode)
+{
+ inodes_stat.nr_inodes++;
+ list_add(&inode->i_list, &inode_in_use);
+ list_add(&inode->i_sb_list, &sb->s_inodes);
+ if (head)
+ hlist_add_head(&inode->i_hash, head);
+}
+
+/**
+ * inode_add_to_lists - add a new inode to relevant lists
+ * @sb - superblock inode belongs to.
+ * @inode - inode to mark in use
+ *
+ * When an inode is allocated it needs to be accounted for, added to the in use
+ * list, the owning superblock and the inode hash. This needs to be done under
+ * the inode_lock, so export a function to do this rather than the inode lock
+ * itself. We calculate the hash list to add to here so it is all internal
+ * which requires the caller to have already set up the inode number in the
+ * inode to add.
+ */
+void inode_add_to_lists(struct super_block *sb, struct inode *inode)
+{
+ struct hlist_head *head = inode_hashtable + hash(sb, inode->i_ino);
+
+ spin_lock(&inode_lock);
+ __inode_add_to_lists(sb, head, inode);
+ spin_unlock(&inode_lock);
+}
+EXPORT_SYMBOL_GPL(inode_add_to_lists);
+
/**
* new_inode - obtain an inode
* @sb: superblock
@@ -577,9 +620,7 @@ struct inode *new_inode(struct super_block *sb)
inode = alloc_inode(sb);
if (inode) {
spin_lock(&inode_lock);
- inodes_stat.nr_inodes++;
- list_add(&inode->i_list, &inode_in_use);
- list_add(&inode->i_sb_list, &sb->s_inodes);
+ __inode_add_to_lists(sb, NULL, inode);
inode->i_ino = ++last_ino;
inode->i_state = 0;
spin_unlock(&inode_lock);
@@ -638,10 +679,7 @@ static struct inode * get_new_inode(struct super_block *sb, struct hlist_head *h
if (set(inode, data))
goto set_failed;
- inodes_stat.nr_inodes++;
- list_add(&inode->i_list, &inode_in_use);
- list_add(&inode->i_sb_list, &sb->s_inodes);
- hlist_add_head(&inode->i_hash, head);
+ __inode_add_to_lists(sb, head, inode);
inode->i_state = I_LOCK|I_NEW;
spin_unlock(&inode_lock);
@@ -687,10 +725,7 @@ static struct inode * get_new_inode_fast(struct super_block *sb, struct hlist_he
old = find_inode_fast(sb, head, ino);
if (!old) {
inode->i_ino = ino;
- inodes_stat.nr_inodes++;
- list_add(&inode->i_list, &inode_in_use);
- list_add(&inode->i_sb_list, &sb->s_inodes);
- hlist_add_head(&inode->i_hash, head);
+ __inode_add_to_lists(sb, head, inode);
inode->i_state = I_LOCK|I_NEW;
spin_unlock(&inode_lock);
@@ -714,16 +749,6 @@ static struct inode * get_new_inode_fast(struct super_block *sb, struct hlist_he
return inode;
}
-static unsigned long hash(struct super_block *sb, unsigned long hashval)
-{
- unsigned long tmp;
-
- tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) /
- L1_CACHE_BYTES;
- tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
- return tmp & I_HASHMASK;
-}
-
/**
* iunique - get a unique inode number
* @sb: superblock
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ce55983..cf65c71 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1813,6 +1813,7 @@ extern loff_t vfs_llseek(struct file *file, loff_t offset, int origin);
extern struct inode * inode_init_always(struct super_block *, struct inode *);
extern void inode_init_once(struct inode *);
+extern void inode_add_to_lists(struct super_block *, struct inode *);
extern void iput(struct inode *);
extern struct inode * igrab(struct inode *);
extern ino_t iunique(struct super_block *, ino_t);
--
1.5.6.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/5] XFS: Combine the XFS and Linux inodes V3
2008-10-07 21:52 [PATCH 0/5] Combine the XFS and Linux inode structures V2 Dave Chinner
` (3 preceding siblings ...)
2008-10-07 21:52 ` [PATCH 4/5] Inode: Allow external list initialisation Dave Chinner
@ 2008-10-07 21:52 ` Dave Chinner
2008-10-09 4:21 ` [PATCH 6/5]: XFS: Prevent use-after-free caused by synchronous inode reclaim Dave Chinner
5 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 21:52 UTC (permalink / raw)
To: xfs; +Cc: linux-fsdevel
To avoid issues with different lifecycles of XFS and Linux inodes,
embedd the linux inode inside the XFS inode. This means that the
linux inode has the same lifecycle as the XFS inode, even when it
has been released by the OS. XFS inodes don't live much longer than
this (a short stint in reclaim at most), so there isn't significant
memory usage penalties here.
Version 3
o kill xfs_icount()
Version 2
o remove unused commented out code from xfs_iget().
o kill useless cast in VFS_I()
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/linux-2.6/xfs_iops.c | 17 +++-
fs/xfs/linux-2.6/xfs_super.c | 47 +++++-------
fs/xfs/linux-2.6/xfs_vnode.c | 15 +----
fs/xfs/xfs_iget.c | 167 +++++++++---------------------------------
fs/xfs/xfs_inode.c | 43 ++++++++---
fs/xfs/xfs_inode.h | 9 ++-
fs/xfs/xfs_vnodeops.c | 13 +---
7 files changed, 108 insertions(+), 203 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
index 3bfb3c0..37bb101 100644
--- a/fs/xfs/linux-2.6/xfs_iops.c
+++ b/fs/xfs/linux-2.6/xfs_iops.c
@@ -64,14 +64,14 @@ xfs_synchronize_atime(
{
struct inode *inode = VFS_I(ip);
- if (inode) {
+ if (!(inode->i_state & I_CLEAR)) {
ip->i_d.di_atime.t_sec = (__int32_t)inode->i_atime.tv_sec;
ip->i_d.di_atime.t_nsec = (__int32_t)inode->i_atime.tv_nsec;
}
}
/*
- * If the linux inode exists, mark it dirty.
+ * If the linux inode is valid, mark it dirty.
* Used when commiting a dirty inode into a transaction so that
* the inode will get written back by the linux code
*/
@@ -81,7 +81,7 @@ xfs_mark_inode_dirty_sync(
{
struct inode *inode = VFS_I(ip);
- if (inode)
+ if (!(inode->i_state & (I_WILL_FREE|I_FREEING|I_CLEAR)))
mark_inode_dirty_sync(inode);
}
@@ -766,12 +766,21 @@ xfs_diflags_to_iflags(
* When reading existing inodes from disk this is called directly
* from xfs_iget, when creating a new inode it is called from
* xfs_ialloc after setting up the inode.
+ *
+ * We are always called with an uninitialised linux inode here.
+ * We need to initialise the necessary fields and take a reference
+ * on it.
*/
void
xfs_setup_inode(
struct xfs_inode *ip)
{
- struct inode *inode = ip->i_vnode;
+ struct inode *inode = &ip->i_vnode;
+
+ inode->i_ino = ip->i_ino;
+ inode->i_state = I_NEW|I_LOCK;
+ inode_add_to_lists(ip->i_mount->m_super, inode);
+ ASSERT(atomic_read(&inode->i_count) == 1);
inode->i_mode = ip->i_d.di_mode;
inode->i_nlink = ip->i_d.di_nlink;
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 7474859..b893f8c 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -72,7 +72,6 @@
static struct quotactl_ops xfs_quotactl_operations;
static struct super_operations xfs_super_operations;
-static kmem_zone_t *xfs_vnode_zone;
static kmem_zone_t *xfs_ioend_zone;
mempool_t *xfs_ioend_pool;
@@ -867,29 +866,24 @@ xfsaild_stop(
}
-
+/* Catch misguided souls that try to use this interface on XFS */
STATIC struct inode *
xfs_fs_alloc_inode(
struct super_block *sb)
{
- return kmem_zone_alloc(xfs_vnode_zone, KM_SLEEP);
+ BUG();
}
+/*
+ * we need to provide an empty inode free function to prevent
+ * the generic code from trying to free our combined inode.
+ */
STATIC void
xfs_fs_destroy_inode(
- struct inode *inode)
-{
- kmem_zone_free(xfs_vnode_zone, inode);
-}
-
-STATIC void
-xfs_fs_inode_init_once(
- void *vnode)
+ struct inode *inode)
{
- inode_init_once((struct inode *)vnode);
}
-
/*
* Slab object creation initialisation for the XFS inode.
* This covers only the idempotent fields in the XFS inode;
@@ -898,13 +892,18 @@ xfs_fs_inode_init_once(
* fields in the xfs inode that left in the initialise state
* when freeing the inode.
*/
-void
-xfs_inode_init_once(
+STATIC void
+xfs_fs_inode_init_once(
void *inode)
{
struct xfs_inode *ip = inode;
memset(ip, 0, sizeof(struct xfs_inode));
+
+ /* vfs inode */
+ inode_init_once(VFS_I(ip));
+
+ /* xfs inode */
atomic_set(&ip->i_iocount, 0);
atomic_set(&ip->i_pincount, 0);
spin_lock_init(&ip->i_flags_lock);
@@ -975,8 +974,6 @@ xfs_fs_clear_inode(
if (xfs_reclaim(ip))
panic("%s: cannot reclaim 0x%p\n", __func__, inode);
}
-
- ASSERT(XFS_I(inode) == NULL);
}
STATIC void
@@ -1838,16 +1835,10 @@ xfs_free_trace_bufs(void)
STATIC int __init
xfs_init_zones(void)
{
- xfs_vnode_zone = kmem_zone_init_flags(sizeof(struct inode), "xfs_vnode",
- KM_ZONE_HWALIGN | KM_ZONE_RECLAIM |
- KM_ZONE_SPREAD,
- xfs_fs_inode_init_once);
- if (!xfs_vnode_zone)
- goto out;
xfs_ioend_zone = kmem_zone_init(sizeof(xfs_ioend_t), "xfs_ioend");
if (!xfs_ioend_zone)
- goto out_destroy_vnode_zone;
+ goto out;
xfs_ioend_pool = mempool_create_slab_pool(4 * MAX_BUF_PER_PAGE,
xfs_ioend_zone);
@@ -1863,6 +1854,7 @@ xfs_init_zones(void)
"xfs_bmap_free_item");
if (!xfs_bmap_free_item_zone)
goto out_destroy_log_ticket_zone;
+
xfs_btree_cur_zone = kmem_zone_init(sizeof(xfs_btree_cur_t),
"xfs_btree_cur");
if (!xfs_btree_cur_zone)
@@ -1910,8 +1902,8 @@ xfs_init_zones(void)
xfs_inode_zone =
kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode",
- KM_ZONE_HWALIGN | KM_ZONE_RECLAIM |
- KM_ZONE_SPREAD, xfs_inode_init_once);
+ KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD,
+ xfs_fs_inode_init_once);
if (!xfs_inode_zone)
goto out_destroy_efi_zone;
@@ -1959,8 +1951,6 @@ xfs_init_zones(void)
mempool_destroy(xfs_ioend_pool);
out_destroy_ioend_zone:
kmem_zone_destroy(xfs_ioend_zone);
- out_destroy_vnode_zone:
- kmem_zone_destroy(xfs_vnode_zone);
out:
return -ENOMEM;
}
@@ -1985,7 +1975,6 @@ xfs_destroy_zones(void)
kmem_zone_destroy(xfs_log_ticket_zone);
mempool_destroy(xfs_ioend_pool);
kmem_zone_destroy(xfs_ioend_zone);
- kmem_zone_destroy(xfs_vnode_zone);
}
diff --git a/fs/xfs/linux-2.6/xfs_vnode.c b/fs/xfs/linux-2.6/xfs_vnode.c
index dceb6db..933f89a 100644
--- a/fs/xfs/linux-2.6/xfs_vnode.c
+++ b/fs/xfs/linux-2.6/xfs_vnode.c
@@ -84,25 +84,12 @@ vn_ioerror(
#ifdef XFS_INODE_TRACE
-/*
- * Reference count of Linux inode if present, -1 if the xfs_inode
- * has no associated Linux inode.
- */
-static inline int xfs_icount(struct xfs_inode *ip)
-{
- struct inode *inode = VFS_I(ip);
-
- if (!inode)
- return atomic_read(&inode->i_count);
- return -1;
-}
-
#define KTRACE_ENTER(ip, vk, s, line, ra) \
ktrace_enter( (ip)->i_trace, \
/* 0 */ (void *)(__psint_t)(vk), \
/* 1 */ (void *)(s), \
/* 2 */ (void *)(__psint_t) line, \
-/* 3 */ (void *)(__psint_t)xfs_icount(ip), \
+/* 3 */ (void *)(__psint_t)atomic_read(&VFS_I(ip)->i_count) \
/* 4 */ (void *)(ra), \
/* 5 */ NULL, \
/* 6 */ (void *)(__psint_t)current_cpu(), \
diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index b2539b1..c4414e8 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -44,77 +44,65 @@
*/
static int
xfs_iget_cache_hit(
- struct inode *inode,
struct xfs_perag *pag,
struct xfs_inode *ip,
int flags,
int lock_flags) __releases(pag->pag_ici_lock)
{
struct xfs_mount *mp = ip->i_mount;
- struct inode *old_inode;
int error = 0;
/*
* If INEW is set this inode is being set up
+ * If IRECLAIM is set this inode is being torn down
* Pause and try again.
*/
- if (xfs_iflags_test(ip, XFS_INEW)) {
+ if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
error = EAGAIN;
XFS_STATS_INC(xs_ig_frecycle);
goto out_error;
}
- old_inode = ip->i_vnode;
- if (old_inode == NULL) {
+ /* If IRECLAIMABLE is set, we've torn down the vfs inode part */
+ if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
+
/*
- * If IRECLAIM is set this inode is
- * on its way out of the system,
- * we need to pause and try again.
+ * If lookup is racing with unlink, then we should return an
+ * error immediately so we don't remove it from the reclaim
+ * list and potentially leak the inode.
*/
- if (xfs_iflags_test(ip, XFS_IRECLAIM)) {
- error = EAGAIN;
- XFS_STATS_INC(xs_ig_frecycle);
+
+ if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
+ error = ENOENT;
goto out_error;
}
- ASSERT(xfs_iflags_test(ip, XFS_IRECLAIMABLE));
+
+ xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
/*
- * If lookup is racing with unlink, then we
- * should return an error immediately so we
- * don't remove it from the reclaim list and
- * potentially leak the inode.
+ * We need to re-initialise the VFS inode as it has been
+ * 'freed' by the VFS. Do this here so we can deal with
+ * errors cleanly, then tag it so it can be set up correctly
+ * later.
*/
- if ((ip->i_d.di_mode == 0) &&
- !(flags & XFS_IGET_CREATE)) {
- error = ENOENT;
+ if (!inode_init_always(mp->m_super, VFS_I(ip))) {
+ error = ENOMEM;
goto out_error;
}
- xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
-
+ xfs_iflags_set(ip, XFS_INEW);
xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
read_unlock(&pag->pag_ici_lock);
XFS_MOUNT_ILOCK(mp);
list_del_init(&ip->i_reclaim);
XFS_MOUNT_IUNLOCK(mp);
-
- } else if (inode != old_inode) {
- /* The inode is being torn down, pause and
- * try again.
- */
- if (old_inode->i_state & (I_FREEING | I_CLEAR)) {
- error = EAGAIN;
- XFS_STATS_INC(xs_ig_frecycle);
- goto out_error;
- }
-/* Chances are the other vnode (the one in the inode) is being torn
-* down right now, and we landed on top of it. Question is, what do
-* we do? Unhook the old inode and hook up the new one?
-*/
- cmn_err(CE_PANIC,
- "xfs_iget_core: ambiguous vns: vp/0x%p, invp/0x%p",
- old_inode, inode);
+ } else if (!igrab(VFS_I(ip))) {
+ /* If the VFS inode is being torn down, pause and try again. */
+ error = EAGAIN;
+ XFS_STATS_INC(xs_ig_frecycle);
+ goto out_error;
} else {
+ /* we've got a live one */
read_unlock(&pag->pag_ici_lock);
}
@@ -215,11 +203,11 @@ out_destroy:
/*
* Look up an inode by number in the given file system.
* The inode is looked up in the cache held in each AG.
- * If the inode is found in the cache, attach it to the provided
- * vnode.
+ * If the inode is found in the cache, initialise the vfs inode
+ * if necessary.
*
* If it is not in core, read it in from the file system's device,
- * add it to the cache and attach the provided vnode.
+ * add it to the cache and initialise the vfs inode.
*
* The inode is locked according to the value of the lock_flags parameter.
* This flag parameter indicates how and if the inode's IO lock and inode lock
@@ -236,9 +224,8 @@ out_destroy:
* bno -- the block number starting the buffer containing the inode,
* if known (as by bulkstat), else 0.
*/
-STATIC int
-xfs_iget_core(
- struct inode *inode,
+int
+xfs_iget(
xfs_mount_t *mp,
xfs_trans_t *tp,
xfs_ino_t ino,
@@ -269,7 +256,7 @@ again:
ip = radix_tree_lookup(&pag->pag_ici_root, agino);
if (ip) {
- error = xfs_iget_cache_hit(inode, pag, ip, flags, lock_flags);
+ error = xfs_iget_cache_hit(pag, ip, flags, lock_flags);
if (error)
goto out_error_or_again;
} else {
@@ -283,23 +270,16 @@ again:
}
xfs_put_perag(mp, pag);
- ASSERT(ip->i_df.if_ext_max ==
- XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t));
-
xfs_iflags_set(ip, XFS_IMODIFIED);
*ipp = ip;
- /*
- * Set up the Linux with the Linux inode.
- */
- ip->i_vnode = inode;
- inode->i_private = ip;
-
+ ASSERT(ip->i_df.if_ext_max ==
+ XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t));
/*
* If we have a real type for an on-disk inode, we can set ops(&unlock)
* now. If it's a new inode being created, xfs_ialloc will handle it.
*/
- if (ip->i_d.di_mode != 0)
+ if (xfs_iflags_test(ip, XFS_INEW) && ip->i_d.di_mode != 0)
xfs_setup_inode(ip);
return 0;
@@ -314,75 +294,6 @@ out_error_or_again:
/*
- * The 'normal' internal xfs_iget, if needed it will
- * 'allocate', or 'get', the vnode.
- */
-int
-xfs_iget(
- xfs_mount_t *mp,
- xfs_trans_t *tp,
- xfs_ino_t ino,
- uint flags,
- uint lock_flags,
- xfs_inode_t **ipp,
- xfs_daddr_t bno)
-{
- struct inode *inode;
- xfs_inode_t *ip;
- int error;
-
- XFS_STATS_INC(xs_ig_attempts);
-
-retry:
- inode = iget_locked(mp->m_super, ino);
- if (!inode)
- /* If we got no inode we are out of memory */
- return ENOMEM;
-
- if (inode->i_state & I_NEW) {
- XFS_STATS_INC(vn_active);
- XFS_STATS_INC(vn_alloc);
-
- error = xfs_iget_core(inode, mp, tp, ino, flags,
- lock_flags, ipp, bno);
- if (error) {
- make_bad_inode(inode);
- if (inode->i_state & I_NEW)
- unlock_new_inode(inode);
- iput(inode);
- }
- return error;
- }
-
- /*
- * If the inode is not fully constructed due to
- * filehandle mismatches wait for the inode to go
- * away and try again.
- *
- * iget_locked will call __wait_on_freeing_inode
- * to wait for the inode to go away.
- */
- if (is_bad_inode(inode)) {
- iput(inode);
- delay(1);
- goto retry;
- }
-
- ip = XFS_I(inode);
- if (!ip) {
- iput(inode);
- delay(1);
- goto retry;
- }
-
- if (lock_flags != 0)
- xfs_ilock(ip, lock_flags);
- XFS_STATS_INC(xs_ig_found);
- *ipp = ip;
- return 0;
-}
-
-/*
* Look for the inode corresponding to the given ino in the hash table.
* If it is there and its i_transp pointer matches tp, return it.
* Otherwise, return NULL.
@@ -482,14 +393,6 @@ xfs_ireclaim(xfs_inode_t *ip)
XFS_QM_DQDETACH(ip->i_mount, ip);
/*
- * Pull our behavior descriptor from the vnode chain.
- */
- if (ip->i_vnode) {
- ip->i_vnode->i_private = NULL;
- ip->i_vnode = NULL;
- }
-
- /*
* Free all memory associated with the inode.
*/
xfs_iunlock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 0c65ba2..307a63f 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -812,6 +812,16 @@ xfs_inode_alloc(
ASSERT(!spin_is_locked(&ip->i_flags_lock));
ASSERT(list_empty(&ip->i_reclaim));
+ /*
+ * initialise the VFS inode here to get failures
+ * out of the way early.
+ */
+ if (!inode_init_always(mp->m_super, VFS_I(ip))) {
+ kmem_zone_free(xfs_inode_zone, ip);
+ return NULL;
+ }
+
+ /* initialise the xfs inode */
ip->i_ino = ino;
ip->i_mount = mp;
ip->i_blkno = 0;
@@ -1085,6 +1095,7 @@ xfs_ialloc(
uint flags;
int error;
timespec_t tv;
+ int filestreams = 0;
/*
* Call the space management code to pick
@@ -1092,9 +1103,8 @@ xfs_ialloc(
*/
error = xfs_dialloc(tp, pip ? pip->i_ino : 0, mode, okalloc,
ialloc_context, call_again, &ino);
- if (error != 0) {
+ if (error)
return error;
- }
if (*call_again || ino == NULLFSINO) {
*ipp = NULL;
return 0;
@@ -1108,9 +1118,8 @@ xfs_ialloc(
*/
error = xfs_trans_iget(tp->t_mountp, tp, ino,
XFS_IGET_CREATE, XFS_ILOCK_EXCL, &ip);
- if (error != 0) {
+ if (error)
return error;
- }
ASSERT(ip != NULL);
ip->i_d.di_mode = (__uint16_t)mode;
@@ -1191,13 +1200,12 @@ xfs_ialloc(
flags |= XFS_ILOG_DEV;
break;
case S_IFREG:
- if (pip && xfs_inode_is_filestream(pip)) {
- error = xfs_filestream_associate(pip, ip);
- if (error < 0)
- return -error;
- if (!error)
- xfs_iflags_set(ip, XFS_IFILESTREAM);
- }
+ /*
+ * we can't set up filestreams until after the VFS inode
+ * is set up properly.
+ */
+ if (pip && xfs_inode_is_filestream(pip))
+ filestreams = 1;
/* fall through */
case S_IFDIR:
if (pip && (pip->i_d.di_flags & XFS_DIFLAG_ANY)) {
@@ -1263,6 +1271,15 @@ xfs_ialloc(
/* now that we have an i_mode we can setup inode ops and unlock */
xfs_setup_inode(ip);
+ /* now we have set up the vfs inode we can associate the filestream */
+ if (filestreams) {
+ error = xfs_filestream_associate(pip, ip);
+ if (error < 0)
+ return -error;
+ if (!error)
+ xfs_iflags_set(ip, XFS_IFILESTREAM);
+ }
+
*ipp = ip;
return 0;
}
@@ -2653,6 +2670,10 @@ xfs_idestroy_fork(
* It must free the inode itself and any buffers allocated for
* if_extents/if_data and if_broot. It must also free the lock
* associated with the inode.
+ *
+ * Note: because we don't initialise everything on reallocation out
+ * of the zone, we must ensure we nullify everything correctly before
+ * freeing the structure.
*/
void
xfs_idestroy(
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 47265b8..ab70cf3 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -237,7 +237,6 @@ typedef struct xfs_inode {
/* Inode linking and identification information. */
struct xfs_mount *i_mount; /* fs mount struct ptr */
struct list_head i_reclaim; /* reclaim list */
- struct inode *i_vnode; /* vnode backpointer */
struct xfs_dquot *i_udquot; /* user dquot */
struct xfs_dquot *i_gdquot; /* group dquot */
@@ -271,6 +270,10 @@ typedef struct xfs_inode {
xfs_fsize_t i_size; /* in-memory size */
xfs_fsize_t i_new_size; /* size when write completes */
atomic_t i_iocount; /* outstanding I/O count */
+
+ /* VFS inode */
+ struct inode i_vnode; /* embedded VFS inode */
+
/* Trace buffers per inode. */
#ifdef XFS_INODE_TRACE
struct ktrace *i_trace; /* general inode trace */
@@ -298,13 +301,13 @@ typedef struct xfs_inode {
/* Convert from vfs inode to xfs inode */
static inline struct xfs_inode *XFS_I(struct inode *inode)
{
- return (struct xfs_inode *)inode->i_private;
+ return container_of(inode, struct xfs_inode, i_vnode);
}
/* convert from xfs inode to vfs inode */
static inline struct inode *VFS_I(struct xfs_inode *ip)
{
- return (struct inode *)ip->i_vnode;
+ return &ip->i_vnode;
}
/*
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 8ab88ba..8bd8f19 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2808,6 +2808,7 @@ xfs_reclaim(
if (!ip->i_update_core && (ip->i_itemp == NULL)) {
xfs_ilock(ip, XFS_ILOCK_EXCL);
xfs_iflock(ip);
+ xfs_iflags_set(ip, XFS_IRECLAIMABLE);
return xfs_finish_reclaim(ip, 1, XFS_IFLUSH_DELWRI_ELSE_SYNC);
} else {
xfs_mount_t *mp = ip->i_mount;
@@ -2816,8 +2817,6 @@ xfs_reclaim(
XFS_MOUNT_ILOCK(mp);
spin_lock(&ip->i_flags_lock);
__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
- VFS_I(ip)->i_private = NULL;
- ip->i_vnode = NULL;
spin_unlock(&ip->i_flags_lock);
list_add_tail(&ip->i_reclaim, &mp->m_del_inodes);
XFS_MOUNT_IUNLOCK(mp);
@@ -2832,10 +2831,6 @@ xfs_finish_reclaim(
int sync_mode)
{
xfs_perag_t *pag = xfs_get_perag(ip->i_mount, ip->i_ino);
- struct inode *vp = VFS_I(ip);
-
- if (vp && VN_BAD(vp))
- goto reclaim;
/* The hash lock here protects a thread in xfs_iget_core from
* racing with us on linking the inode back with a vnode.
@@ -2845,7 +2840,7 @@ xfs_finish_reclaim(
write_lock(&pag->pag_ici_lock);
spin_lock(&ip->i_flags_lock);
if (__xfs_iflags_test(ip, XFS_IRECLAIM) ||
- (!__xfs_iflags_test(ip, XFS_IRECLAIMABLE) && vp == NULL)) {
+ !__xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
spin_unlock(&ip->i_flags_lock);
write_unlock(&pag->pag_ici_lock);
if (locked) {
@@ -2879,15 +2874,13 @@ xfs_finish_reclaim(
* In the case of a forced shutdown we rely on xfs_iflush() to
* wait for the inode to be unpinned before returning an error.
*/
- if (xfs_iflush(ip, sync_mode) == 0) {
+ if (!is_bad_inode(VFS_I(ip)) && xfs_iflush(ip, sync_mode) == 0) {
/* synchronize with xfs_iflush_done */
xfs_iflock(ip);
xfs_ifunlock(ip);
}
xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
- reclaim:
xfs_ireclaim(ip);
return 0;
}
--
1.5.6.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 6/5]: XFS: Prevent use-after-free caused by synchronous inode reclaim
2008-10-07 21:52 [PATCH 0/5] Combine the XFS and Linux inode structures V2 Dave Chinner
` (4 preceding siblings ...)
2008-10-07 21:52 ` [PATCH 5/5] XFS: Combine the XFS and Linux inodes V3 Dave Chinner
@ 2008-10-09 4:21 ` Dave Chinner
2008-10-09 7:02 ` Christoph Hellwig
5 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2008-10-09 4:21 UTC (permalink / raw)
To: xfs; +Cc: linux-fsdevel
Folks,
The following patch fixes a use after free I just found.
It appears that switching between SLAB and SLUB seems to
turn off slab/slub memory poisoning, so i dƣdn't realise
I'd be running for some time without poisoning turned on.
Once I turned poisoning back on I found this use-after-free
immediately on the first unmount trying to reclaim a clean
realtime bitmap inode.
With this patch, the netire patchset that I posted yesterday
passes xfsqa with memory poisoning turned on.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
XFS: Prevent use-after-free caused by synchronous inode reclaim
With the combined linux and XFS inode, we need to ensure that the
combined structure is not freed before the generic code is finished
with the inode. As it turns out, there is a case where the XFS inode
is freed before the linux inode - when xfs_reclaim() is called from
->clear_inode() on a clean inode, the xfs inode is freed during
that call. The generic code references the inode after the
->clear_inode() call, so this is a use after free situation.
Fix the problem by moving the xfs_reclaim() call to ->destroy_inode()
instead of in ->clear_inode(). This ensures the combined inode
structure is not freed until after the generic code has finished
with it.
---
fs/xfs/linux-2.6/xfs_super.c | 32 ++++++++++++++------------------
1 files changed, 14 insertions(+), 18 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index b893f8c..7e5a9b7 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -875,13 +875,18 @@ xfs_fs_alloc_inode(
}
/*
- * we need to provide an empty inode free function to prevent
- * the generic code from trying to free our combined inode.
+ * Now that the generic code is guaranteed not to be accessing
+ * the linux inode, we can reclaim the inode.
*/
STATIC void
xfs_fs_destroy_inode(
struct inode *inode)
{
+ xfs_inode_t *ip = XFS_I(inode);
+
+ XFS_STATS_INC(vn_reclaim);
+ if (xfs_reclaim(ip))
+ panic("%s: cannot reclaim 0x%p\n", __func__, inode);
}
/*
@@ -958,22 +963,13 @@ xfs_fs_clear_inode(
{
xfs_inode_t *ip = XFS_I(inode);
- /*
- * ip can be null when xfs_iget_core calls xfs_idestroy if we
- * find an inode with di_mode == 0 but without IGET_CREATE set.
- */
- if (ip) {
- xfs_itrace_entry(ip);
- XFS_STATS_INC(vn_rele);
- XFS_STATS_INC(vn_remove);
- XFS_STATS_INC(vn_reclaim);
- XFS_STATS_DEC(vn_active);
-
- xfs_inactive(ip);
- xfs_iflags_clear(ip, XFS_IMODIFIED);
- if (xfs_reclaim(ip))
- panic("%s: cannot reclaim 0x%p\n", __func__, inode);
- }
+ xfs_itrace_entry(ip);
+ XFS_STATS_INC(vn_rele);
+ XFS_STATS_INC(vn_remove);
+ XFS_STATS_DEC(vn_active);
+
+ xfs_inactive(ip);
+ xfs_iflags_clear(ip, XFS_IMODIFIED);
}
STATIC void
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 6/5]: XFS: Prevent use-after-free caused by synchronous inode reclaim
2008-10-09 4:21 ` [PATCH 6/5]: XFS: Prevent use-after-free caused by synchronous inode reclaim Dave Chinner
@ 2008-10-09 7:02 ` Christoph Hellwig
2008-10-09 8:07 ` Dave Chinner
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2008-10-09 7:02 UTC (permalink / raw)
To: xfs, linux-fsdevel
On Thu, Oct 09, 2008 at 03:21:34PM +1100, Dave Chinner wrote:
> Folks,
>
> The following patch fixes a use after free I just found.
> It appears that switching between SLAB and SLUB seems to
> turn off slab/slub memory poisoning, so i d??dn't realise
> I'd be running for some time without poisoning turned on.
> Once I turned poisoning back on I found this use-after-free
> immediately on the first unmount trying to reclaim a clean
> realtime bitmap inode.
>
> With this patch, the netire patchset that I posted yesterday
> passes xfsqa with memory poisoning turned on.
Looks good.
> + XFS_STATS_INC(vn_reclaim);
> + if (xfs_reclaim(ip))
> + panic("%s: cannot reclaim 0x%p\n", __func__, inode);
Eventually we should kill the return value from xfs_reclaim and just put
an assert directly into it. In fact given that xfs_reclaim is quite
OS dependent we might just merge the content directly into
destroy_inode.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 6/5]: XFS: Prevent use-after-free caused by synchronous inode reclaim
2008-10-09 7:02 ` Christoph Hellwig
@ 2008-10-09 8:07 ` Dave Chinner
2008-10-09 8:20 ` Christoph Hellwig
0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2008-10-09 8:07 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs, linux-fsdevel
On Thu, Oct 09, 2008 at 03:02:45AM -0400, Christoph Hellwig wrote:
> On Thu, Oct 09, 2008 at 03:21:34PM +1100, Dave Chinner wrote:
> > Folks,
> >
> > The following patch fixes a use after free I just found.
> > It appears that switching between SLAB and SLUB seems to
> > turn off slab/slub memory poisoning, so i d??dn't realise
> > I'd be running for some time without poisoning turned on.
> > Once I turned poisoning back on I found this use-after-free
> > immediately on the first unmount trying to reclaim a clean
> > realtime bitmap inode.
> >
> > With this patch, the netire patchset that I posted yesterday
> > passes xfsqa with memory poisoning turned on.
>
> Looks good.
>
> > + XFS_STATS_INC(vn_reclaim);
> > + if (xfs_reclaim(ip))
> > + panic("%s: cannot reclaim 0x%p\n", __func__, inode);
>
> Eventually we should kill the return value from xfs_reclaim and just put
> an assert directly into it. In fact given that xfs_reclaim is quite
> OS dependent we might just merge the content directly into
> destroy_inode.
Yeah, I was thinking of doing exactly that in this patch, but I
figured that I'd just do the minimum needed to fix the bug because
we're getting close to the next merge window.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 6/5]: XFS: Prevent use-after-free caused by synchronous inode reclaim
2008-10-09 8:07 ` Dave Chinner
@ 2008-10-09 8:20 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2008-10-09 8:20 UTC (permalink / raw)
To: Christoph Hellwig, xfs, linux-fsdevel
On Thu, Oct 09, 2008 at 07:07:41PM +1100, Dave Chinner wrote:
> Yeah, I was thinking of doing exactly that in this patch, but I
> figured that I'd just do the minimum needed to fix the bug because
> we're getting close to the next merge window.
Yeah, should be a separate patch, just making public mental notes so
this gets eventually done..
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/5] Inode: Allow external initialisers
2008-10-14 7:00 ` Lachlan McIlroy
@ 2008-10-14 6:53 ` Dave Chinner
2008-10-14 12:55 ` Christoph Hellwig
1 sibling, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-14 6:53 UTC (permalink / raw)
To: Lachlan McIlroy; +Cc: xfs, linux-fsdevel
On Tue, Oct 14, 2008 at 05:00:57PM +1000, Lachlan McIlroy wrote:
> Dave, this is modifying files outside fs/xfs. Who has reviewed this patch?
It's been posted to -fsdevel twice without anyone commenting
except for Christoph....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/5] Inode: Allow external initialisers
2008-10-07 21:52 ` [PATCH 3/5] Inode: Allow external initialisers Dave Chinner
@ 2008-10-14 7:00 ` Lachlan McIlroy
2008-10-14 6:53 ` Dave Chinner
2008-10-14 12:55 ` Christoph Hellwig
0 siblings, 2 replies; 14+ messages in thread
From: Lachlan McIlroy @ 2008-10-14 7:00 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs, linux-fsdevel
Dave, this is modifying files outside fs/xfs. Who has reviewed this patch?
Dave Chinner wrote:
> To allow XFS to combine the XFS and linux inodes into a single
> structure, we need to drive inode lookup from the XFS inode cache,
> not the generic inode cache. This means that we need initialise a
> struct inode from a context outside alloc_inode() as it is no longer
> used by XFS.
>
> Factor and export the struct inode initialisation code from
> alloc_inode() to inode_init_always() as a counterpart to
> inode_init_once(). i.e. we have to call this init function for each
> inode instantiation (always), as opposed inode_init_once() which is
> only called on slab object instantiation (once).
>
> Signed-off-by: Dave Chinner <david@fromorbit.com>
> ---
> fs/inode.c | 152 +++++++++++++++++++++++++++++-----------------------
> include/linux/fs.h | 1 +
> 2 files changed, 85 insertions(+), 68 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 0487ddb..e7ee999 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -108,84 +108,100 @@ static void wake_up_inode(struct inode *inode)
> wake_up_bit(&inode->i_state, __I_LOCK);
> }
>
> -static struct inode *alloc_inode(struct super_block *sb)
> +/**
> + * inode_init_always - perform inode structure intialisation
> + * @sb - superblock inode belongs to.
> + * @inode - inode to initialise
> + *
> + * These are initializations that need to be done on every inode
> + * allocation as the fields are not initialised by slab allocation.
> + */
> +struct inode *inode_init_always(struct super_block *sb, struct inode *inode)
> {
> static const struct address_space_operations empty_aops;
> static struct inode_operations empty_iops;
> static const struct file_operations empty_fops;
> - struct inode *inode;
> -
> - if (sb->s_op->alloc_inode)
> - inode = sb->s_op->alloc_inode(sb);
> - else
> - inode = (struct inode *) kmem_cache_alloc(inode_cachep, GFP_KERNEL);
>
> - if (inode) {
> - struct address_space * const mapping = &inode->i_data;
> -
> - inode->i_sb = sb;
> - inode->i_blkbits = sb->s_blocksize_bits;
> - inode->i_flags = 0;
> - atomic_set(&inode->i_count, 1);
> - inode->i_op = &empty_iops;
> - inode->i_fop = &empty_fops;
> - inode->i_nlink = 1;
> - atomic_set(&inode->i_writecount, 0);
> - inode->i_size = 0;
> - inode->i_blocks = 0;
> - inode->i_bytes = 0;
> - inode->i_generation = 0;
> + struct address_space * const mapping = &inode->i_data;
> +
> + inode->i_sb = sb;
> + inode->i_blkbits = sb->s_blocksize_bits;
> + inode->i_flags = 0;
> + atomic_set(&inode->i_count, 1);
> + inode->i_op = &empty_iops;
> + inode->i_fop = &empty_fops;
> + inode->i_nlink = 1;
> + atomic_set(&inode->i_writecount, 0);
> + inode->i_size = 0;
> + inode->i_blocks = 0;
> + inode->i_bytes = 0;
> + inode->i_generation = 0;
> #ifdef CONFIG_QUOTA
> - memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
> + memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
> #endif
> - inode->i_pipe = NULL;
> - inode->i_bdev = NULL;
> - inode->i_cdev = NULL;
> - inode->i_rdev = 0;
> - inode->dirtied_when = 0;
> - if (security_inode_alloc(inode)) {
> - if (inode->i_sb->s_op->destroy_inode)
> - inode->i_sb->s_op->destroy_inode(inode);
> - else
> - kmem_cache_free(inode_cachep, (inode));
> - return NULL;
> - }
> + inode->i_pipe = NULL;
> + inode->i_bdev = NULL;
> + inode->i_cdev = NULL;
> + inode->i_rdev = 0;
> + inode->dirtied_when = 0;
> + if (security_inode_alloc(inode)) {
> + if (inode->i_sb->s_op->destroy_inode)
> + inode->i_sb->s_op->destroy_inode(inode);
> + else
> + kmem_cache_free(inode_cachep, (inode));
> + return NULL;
> + }
>
> - spin_lock_init(&inode->i_lock);
> - lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key);
> + spin_lock_init(&inode->i_lock);
> + lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key);
>
> - mutex_init(&inode->i_mutex);
> - lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key);
> + mutex_init(&inode->i_mutex);
> + lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key);
>
> - init_rwsem(&inode->i_alloc_sem);
> - lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key);
> + init_rwsem(&inode->i_alloc_sem);
> + lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key);
>
> - mapping->a_ops = &empty_aops;
> - mapping->host = inode;
> - mapping->flags = 0;
> - mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
> - mapping->assoc_mapping = NULL;
> - mapping->backing_dev_info = &default_backing_dev_info;
> - mapping->writeback_index = 0;
> + mapping->a_ops = &empty_aops;
> + mapping->host = inode;
> + mapping->flags = 0;
> + mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
> + mapping->assoc_mapping = NULL;
> + mapping->backing_dev_info = &default_backing_dev_info;
> + mapping->writeback_index = 0;
>
> - /*
> - * If the block_device provides a backing_dev_info for client
> - * inodes then use that. Otherwise the inode share the bdev's
> - * backing_dev_info.
> - */
> - if (sb->s_bdev) {
> - struct backing_dev_info *bdi;
> + /*
> + * If the block_device provides a backing_dev_info for client
> + * inodes then use that. Otherwise the inode share the bdev's
> + * backing_dev_info.
> + */
> + if (sb->s_bdev) {
> + struct backing_dev_info *bdi;
>
> - bdi = sb->s_bdev->bd_inode_backing_dev_info;
> - if (!bdi)
> - bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
> - mapping->backing_dev_info = bdi;
> - }
> - inode->i_private = NULL;
> - inode->i_mapping = mapping;
> + bdi = sb->s_bdev->bd_inode_backing_dev_info;
> + if (!bdi)
> + bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
> + mapping->backing_dev_info = bdi;
> }
> + inode->i_private = NULL;
> + inode->i_mapping = mapping;
> +
> return inode;
> }
> +EXPORT_SYMBOL(inode_init_always);
> +
> +static struct inode *alloc_inode(struct super_block *sb)
> +{
> + struct inode *inode;
> +
> + if (sb->s_op->alloc_inode)
> + inode = sb->s_op->alloc_inode(sb);
> + else
> + inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
> +
> + if (inode)
> + return inode_init_always(sb, inode);
> + return NULL;
> +}
>
> void destroy_inode(struct inode *inode)
> {
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 580b513..ce55983 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1811,6 +1811,7 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int origin);
>
> extern loff_t vfs_llseek(struct file *file, loff_t offset, int origin);
>
> +extern struct inode * inode_init_always(struct super_block *, struct inode *);
> extern void inode_init_once(struct inode *);
> extern void iput(struct inode *);
> extern struct inode * igrab(struct inode *);
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/5] Inode: Allow external initialisers
2008-10-14 7:00 ` Lachlan McIlroy
2008-10-14 6:53 ` Dave Chinner
@ 2008-10-14 12:55 ` Christoph Hellwig
2008-10-15 1:09 ` Lachlan McIlroy
1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2008-10-14 12:55 UTC (permalink / raw)
To: Lachlan McIlroy; +Cc: Dave Chinner, xfs, linux-fsdevel
On Tue, Oct 14, 2008 at 05:00:57PM +1000, Lachlan McIlroy wrote:
> Dave, this is modifying files outside fs/xfs. Who has reviewed this patch?
I have, and it's also Cc'ed to -fsdevel.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/5] Inode: Allow external initialisers
2008-10-14 12:55 ` Christoph Hellwig
@ 2008-10-15 1:09 ` Lachlan McIlroy
0 siblings, 0 replies; 14+ messages in thread
From: Lachlan McIlroy @ 2008-10-15 1:09 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Dave Chinner, xfs, linux-fsdevel
Okay. Thanks Christoph.
Christoph Hellwig wrote:
> On Tue, Oct 14, 2008 at 05:00:57PM +1000, Lachlan McIlroy wrote:
>> Dave, this is modifying files outside fs/xfs. Who has reviewed this patch?
>
> I have, and it's also Cc'ed to -fsdevel.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-10-15 0:11 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-07 21:52 [PATCH 0/5] Combine the XFS and Linux inode structures V2 Dave Chinner
2008-10-07 21:52 ` [PATCH 1/5] XFS: factor xfs_iget_core() into hit and miss cases Dave Chinner
2008-10-07 21:52 ` [PATCH 2/5] XFS: Never call mark_inode_dirty_sync() directly Dave Chinner
2008-10-07 21:52 ` [PATCH 3/5] Inode: Allow external initialisers Dave Chinner
2008-10-14 7:00 ` Lachlan McIlroy
2008-10-14 6:53 ` Dave Chinner
2008-10-14 12:55 ` Christoph Hellwig
2008-10-15 1:09 ` Lachlan McIlroy
2008-10-07 21:52 ` [PATCH 4/5] Inode: Allow external list initialisation Dave Chinner
2008-10-07 21:52 ` [PATCH 5/5] XFS: Combine the XFS and Linux inodes V3 Dave Chinner
2008-10-09 4:21 ` [PATCH 6/5]: XFS: Prevent use-after-free caused by synchronous inode reclaim Dave Chinner
2008-10-09 7:02 ` Christoph Hellwig
2008-10-09 8:07 ` Dave Chinner
2008-10-09 8:20 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).