* [PATCH 0/3] XFS: Combine Linux and XFS inodes
@ 2008-09-13 14:10 Dave Chinner
2008-09-13 14:10 ` [PATCH 1/3] Inode: Allow external initialisers Dave Chinner
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Dave Chinner @ 2008-09-13 14:10 UTC (permalink / raw)
To: xfs; +Cc: linux-fsdevel
XFS currently has to deal with two separate inode lifecycles
which makes for complexity in inode lookups and reclaim. We
also have the problem of not always having a linux inode around
when it might be useful to have it.
To avoid these lifecycle problems, this series embedѕ the linux
inode inside the struct xfs_inode and changes the way we reference
the two inodes. We can no longer check for a null linux inode -
instead we have to check to see if it is valid or not by checking
either the linux inode or xfs inode state flags. While this means
that inodes waiting for reclaim use more memory, this is not the
common state for inodes and they will soon be completely freed so
the additional memeory use in this state is only temporary.
This combining of the inodes simplifies the inode and reclaim logic,
making it possible to do reclaim via radix tree tags (an upcoming
patch series) and to be able to use RCU locking on the radix trees.
The fact that we don't have a simple mechanism to determine the
reclaim state of the inode makes RCU locking very complex, and this
complexity is removed by having a combined inode structure.
This patch series also changes the way XFS caches inodes. It no
longer uses the linux inode cache as the primary lookup cache -
instead we rely solely on the XFS inode caches. This avoids the
inode_lock in lookups that hit the cache - we should get much
better parallelism out of inode lookup than we currently do now.
In future, we should also be able to cull duplicate fields out of
the xfs and linux inodes reducing the overall memory usage of
the active inode cache. This provides scope for continuing to
reduce the memory footprint of the XFS inode cache.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 1/3] Inode: Allow external initialisers 2008-09-13 14:10 [PATCH 0/3] XFS: Combine Linux and XFS inodes Dave Chinner @ 2008-09-13 14:10 ` Dave Chinner 2008-09-13 14:10 ` [PATCH 2/3] Inode: Allow external list initialisation Dave Chinner 2008-09-13 14:10 ` [PATCH 3/3] XFS: Combine the XFS and Linux inodes V2 Dave Chinner 2 siblings, 0 replies; 6+ messages in thread From: Dave Chinner @ 2008-09-13 14:10 UTC (permalink / raw) To: xfs; +Cc: linux-fsdevel To allow XFS to combine the XFS and linux inodes into a single structure, we need to drive inode lookup from the XFS inode cache, not the generic inode cache. This means that we need initialise a struct inode from a context outside alloc_inode() as it is no longer used by XFS. Factor and export the struct inode initialisation code from alloc_inode() to inode_init_always() as a counterpart to inode_init_once(). i.e. we have to call this init function for each inode instantiation (always), as opposed inode_init_once() which is only called on slab object instantiation (once). Signed-off-by: Dave Chinner <david@fromorbit.com> --- fs/inode.c | 152 +++++++++++++++++++++++++++++----------------------- include/linux/fs.h | 1 + 2 files changed, 85 insertions(+), 68 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 0487ddb..e7ee999 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -108,84 +108,100 @@ static void wake_up_inode(struct inode *inode) wake_up_bit(&inode->i_state, __I_LOCK); } -static struct inode *alloc_inode(struct super_block *sb) +/** + * inode_init_always - perform inode structure intialisation + * @sb - superblock inode belongs to. + * @inode - inode to initialise + * + * These are initializations that need to be done on every inode + * allocation as the fields are not initialised by slab allocation. + */ +struct inode *inode_init_always(struct super_block *sb, struct inode *inode) { static const struct address_space_operations empty_aops; static struct inode_operations empty_iops; static const struct file_operations empty_fops; - struct inode *inode; - - if (sb->s_op->alloc_inode) - inode = sb->s_op->alloc_inode(sb); - else - inode = (struct inode *) kmem_cache_alloc(inode_cachep, GFP_KERNEL); - if (inode) { - struct address_space * const mapping = &inode->i_data; - - inode->i_sb = sb; - inode->i_blkbits = sb->s_blocksize_bits; - inode->i_flags = 0; - atomic_set(&inode->i_count, 1); - inode->i_op = &empty_iops; - inode->i_fop = &empty_fops; - inode->i_nlink = 1; - atomic_set(&inode->i_writecount, 0); - inode->i_size = 0; - inode->i_blocks = 0; - inode->i_bytes = 0; - inode->i_generation = 0; + struct address_space * const mapping = &inode->i_data; + + inode->i_sb = sb; + inode->i_blkbits = sb->s_blocksize_bits; + inode->i_flags = 0; + atomic_set(&inode->i_count, 1); + inode->i_op = &empty_iops; + inode->i_fop = &empty_fops; + inode->i_nlink = 1; + atomic_set(&inode->i_writecount, 0); + inode->i_size = 0; + inode->i_blocks = 0; + inode->i_bytes = 0; + inode->i_generation = 0; #ifdef CONFIG_QUOTA - memset(&inode->i_dquot, 0, sizeof(inode->i_dquot)); + memset(&inode->i_dquot, 0, sizeof(inode->i_dquot)); #endif - inode->i_pipe = NULL; - inode->i_bdev = NULL; - inode->i_cdev = NULL; - inode->i_rdev = 0; - inode->dirtied_when = 0; - if (security_inode_alloc(inode)) { - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - kmem_cache_free(inode_cachep, (inode)); - return NULL; - } + inode->i_pipe = NULL; + inode->i_bdev = NULL; + inode->i_cdev = NULL; + inode->i_rdev = 0; + inode->dirtied_when = 0; + if (security_inode_alloc(inode)) { + if (inode->i_sb->s_op->destroy_inode) + inode->i_sb->s_op->destroy_inode(inode); + else + kmem_cache_free(inode_cachep, (inode)); + return NULL; + } - spin_lock_init(&inode->i_lock); - lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key); + spin_lock_init(&inode->i_lock); + lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key); - mutex_init(&inode->i_mutex); - lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key); + mutex_init(&inode->i_mutex); + lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key); - init_rwsem(&inode->i_alloc_sem); - lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key); + init_rwsem(&inode->i_alloc_sem); + lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key); - mapping->a_ops = &empty_aops; - mapping->host = inode; - mapping->flags = 0; - mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE); - mapping->assoc_mapping = NULL; - mapping->backing_dev_info = &default_backing_dev_info; - mapping->writeback_index = 0; + mapping->a_ops = &empty_aops; + mapping->host = inode; + mapping->flags = 0; + mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE); + mapping->assoc_mapping = NULL; + mapping->backing_dev_info = &default_backing_dev_info; + mapping->writeback_index = 0; - /* - * If the block_device provides a backing_dev_info for client - * inodes then use that. Otherwise the inode share the bdev's - * backing_dev_info. - */ - if (sb->s_bdev) { - struct backing_dev_info *bdi; + /* + * If the block_device provides a backing_dev_info for client + * inodes then use that. Otherwise the inode share the bdev's + * backing_dev_info. + */ + if (sb->s_bdev) { + struct backing_dev_info *bdi; - bdi = sb->s_bdev->bd_inode_backing_dev_info; - if (!bdi) - bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info; - mapping->backing_dev_info = bdi; - } - inode->i_private = NULL; - inode->i_mapping = mapping; + bdi = sb->s_bdev->bd_inode_backing_dev_info; + if (!bdi) + bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info; + mapping->backing_dev_info = bdi; } + inode->i_private = NULL; + inode->i_mapping = mapping; + return inode; } +EXPORT_SYMBOL(inode_init_always); + +static struct inode *alloc_inode(struct super_block *sb) +{ + struct inode *inode; + + if (sb->s_op->alloc_inode) + inode = sb->s_op->alloc_inode(sb); + else + inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL); + + if (inode) + return inode_init_always(sb, inode); + return NULL; +} void destroy_inode(struct inode *inode) { diff --git a/include/linux/fs.h b/include/linux/fs.h index 580b513..ce55983 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1811,6 +1811,7 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int origin); extern loff_t vfs_llseek(struct file *file, loff_t offset, int origin); +extern struct inode * inode_init_always(struct super_block *, struct inode *); extern void inode_init_once(struct inode *); extern void iput(struct inode *); extern struct inode * igrab(struct inode *); -- 1.5.6 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] Inode: Allow external list initialisation 2008-09-13 14:10 [PATCH 0/3] XFS: Combine Linux and XFS inodes Dave Chinner 2008-09-13 14:10 ` [PATCH 1/3] Inode: Allow external initialisers Dave Chinner @ 2008-09-13 14:10 ` Dave Chinner 2008-09-13 14:10 ` [PATCH 3/3] XFS: Combine the XFS and Linux inodes V2 Dave Chinner 2 siblings, 0 replies; 6+ messages in thread From: Dave Chinner @ 2008-09-13 14:10 UTC (permalink / raw) To: xfs; +Cc: linux-fsdevel To allow XFS to combine the XFS and linux inodes into a single structure, we need to drive inode lookup from the XFS inode cache, not the generic inode cache. This means that we need initialise a struct inode from a context outside alloc_inode() as it is no longer used by XFS. After inode allocation and initialisation, we need to add the inode to the superblock list, the in-use list, hash it and do some accounting. This all needs to be done with the inode_lock held and there are already several places in fs/inode.c that do this list manipulation. Factor out the common code, add a locking wrapper and export the function so ti can be called from XFS. Signed-off-by: Dave Chinner <david@fromorbit.com> --- fs/inode.c | 67 +++++++++++++++++++++++++++++++++++---------------- include/linux/fs.h | 1 + 2 files changed, 47 insertions(+), 21 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index e7ee999..fbcf6c5 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -550,6 +550,49 @@ repeat: return node ? inode : NULL; } +static unsigned long hash(struct super_block *sb, unsigned long hashval) +{ + unsigned long tmp; + + tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) / + L1_CACHE_BYTES; + tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS); + return tmp & I_HASHMASK; +} + +static inline void +__inode_add_to_lists(struct super_block *sb, struct hlist_head *head, + struct inode *inode) +{ + inodes_stat.nr_inodes++; + list_add(&inode->i_list, &inode_in_use); + list_add(&inode->i_sb_list, &sb->s_inodes); + if (head) + hlist_add_head(&inode->i_hash, head); +} + +/** + * inode_add_to_lists - add a new inode to relevant lists + * @sb - superblock inode belongs to. + * @inode - inode to mark in use + * + * When an inode is allocated it needs to be accounted for, added to the in use + * list, the owning superblock and the inode hash. This needs to be done under + * the inode_lock, so export a function to do this rather than the inode lock + * itself. We calculate the hash list to add to here so it is all internal + * which requires the caller to have already set up the inode number in the + * inode to add. + */ +void inode_add_to_lists(struct super_block *sb, struct inode *inode) +{ + struct hlist_head *head = inode_hashtable + hash(sb, inode->i_ino); + + spin_lock(&inode_lock); + __inode_add_to_lists(sb, head, inode); + spin_unlock(&inode_lock); +} +EXPORT_SYMBOL_GPL(inode_add_to_lists); + /** * new_inode - obtain an inode * @sb: superblock @@ -577,9 +620,7 @@ struct inode *new_inode(struct super_block *sb) inode = alloc_inode(sb); if (inode) { spin_lock(&inode_lock); - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - list_add(&inode->i_sb_list, &sb->s_inodes); + __inode_add_to_lists(sb, NULL, inode); inode->i_ino = ++last_ino; inode->i_state = 0; spin_unlock(&inode_lock); @@ -638,10 +679,7 @@ static struct inode * get_new_inode(struct super_block *sb, struct hlist_head *h if (set(inode, data)) goto set_failed; - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - list_add(&inode->i_sb_list, &sb->s_inodes); - hlist_add_head(&inode->i_hash, head); + __inode_add_to_lists(sb, head, inode); inode->i_state = I_LOCK|I_NEW; spin_unlock(&inode_lock); @@ -687,10 +725,7 @@ static struct inode * get_new_inode_fast(struct super_block *sb, struct hlist_he old = find_inode_fast(sb, head, ino); if (!old) { inode->i_ino = ino; - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - list_add(&inode->i_sb_list, &sb->s_inodes); - hlist_add_head(&inode->i_hash, head); + __inode_add_to_lists(sb, head, inode); inode->i_state = I_LOCK|I_NEW; spin_unlock(&inode_lock); @@ -714,16 +749,6 @@ static struct inode * get_new_inode_fast(struct super_block *sb, struct hlist_he return inode; } -static unsigned long hash(struct super_block *sb, unsigned long hashval) -{ - unsigned long tmp; - - tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) / - L1_CACHE_BYTES; - tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS); - return tmp & I_HASHMASK; -} - /** * iunique - get a unique inode number * @sb: superblock diff --git a/include/linux/fs.h b/include/linux/fs.h index ce55983..cf65c71 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1813,6 +1813,7 @@ extern loff_t vfs_llseek(struct file *file, loff_t offset, int origin); extern struct inode * inode_init_always(struct super_block *, struct inode *); extern void inode_init_once(struct inode *); +extern void inode_add_to_lists(struct super_block *, struct inode *); extern void iput(struct inode *); extern struct inode * igrab(struct inode *); extern ino_t iunique(struct super_block *, ino_t); -- 1.5.6 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] XFS: Combine the XFS and Linux inodes V2 2008-09-13 14:10 [PATCH 0/3] XFS: Combine Linux and XFS inodes Dave Chinner 2008-09-13 14:10 ` [PATCH 1/3] Inode: Allow external initialisers Dave Chinner 2008-09-13 14:10 ` [PATCH 2/3] Inode: Allow external list initialisation Dave Chinner @ 2008-09-13 14:10 ` Dave Chinner 2008-09-14 13:31 ` Christoph Hellwig 2 siblings, 1 reply; 6+ messages in thread From: Dave Chinner @ 2008-09-13 14:10 UTC (permalink / raw) To: xfs; +Cc: linux-fsdevel To avoid issues with different lifecycles of XFS and Linux inodes, embedd the linux inode inside the XFS inode. This means that the linux inode has the same lifecycle as the XFS inode, even when it has been released by the OS. XFS inodes don't live much longer than this (a short stint in reclaim at most), so there isn't significant memory usage penalties here. Version 2 o remove unused commented out code from xfs_iget(). o kill useless cast in VFS_I() Signed-off-by: Dave Chinner <david@fromorbit.com> --- fs/xfs/linux-2.6/xfs_iops.c | 17 +++- fs/xfs/linux-2.6/xfs_super.c | 48 +++++------- fs/xfs/linux-2.6/xfs_vnode.c | 6 +- fs/xfs/xfs_iget.c | 167 +++++++++--------------------------------- fs/xfs/xfs_inode.c | 43 ++++++++--- fs/xfs/xfs_inode.h | 9 ++- fs/xfs/xfs_vnodeops.c | 13 +--- 7 files changed, 109 insertions(+), 194 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c index 3bfb3c0..37bb101 100644 --- a/fs/xfs/linux-2.6/xfs_iops.c +++ b/fs/xfs/linux-2.6/xfs_iops.c @@ -64,14 +64,14 @@ xfs_synchronize_atime( { struct inode *inode = VFS_I(ip); - if (inode) { + if (!(inode->i_state & I_CLEAR)) { ip->i_d.di_atime.t_sec = (__int32_t)inode->i_atime.tv_sec; ip->i_d.di_atime.t_nsec = (__int32_t)inode->i_atime.tv_nsec; } } /* - * If the linux inode exists, mark it dirty. + * If the linux inode is valid, mark it dirty. * Used when commiting a dirty inode into a transaction so that * the inode will get written back by the linux code */ @@ -81,7 +81,7 @@ xfs_mark_inode_dirty_sync( { struct inode *inode = VFS_I(ip); - if (inode) + if (!(inode->i_state & (I_WILL_FREE|I_FREEING|I_CLEAR))) mark_inode_dirty_sync(inode); } @@ -766,12 +766,21 @@ xfs_diflags_to_iflags( * When reading existing inodes from disk this is called directly * from xfs_iget, when creating a new inode it is called from * xfs_ialloc after setting up the inode. + * + * We are always called with an uninitialised linux inode here. + * We need to initialise the necessary fields and take a reference + * on it. */ void xfs_setup_inode( struct xfs_inode *ip) { - struct inode *inode = ip->i_vnode; + struct inode *inode = &ip->i_vnode; + + inode->i_ino = ip->i_ino; + inode->i_state = I_NEW|I_LOCK; + inode_add_to_lists(ip->i_mount->m_super, inode); + ASSERT(atomic_read(&inode->i_count) == 1); inode->i_mode = ip->i_d.di_mode; inode->i_nlink = ip->i_d.di_nlink; diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c index 8db6806..796734c 100644 --- a/fs/xfs/linux-2.6/xfs_super.c +++ b/fs/xfs/linux-2.6/xfs_super.c @@ -71,7 +71,6 @@ static struct quotactl_ops xfs_quotactl_operations; static struct super_operations xfs_super_operations; -static kmem_zone_t *xfs_vnode_zone; static kmem_zone_t *xfs_ioend_zone; mempool_t *xfs_ioend_pool; @@ -866,29 +865,25 @@ xfsaild_stop( } - +/* Catch misguided souls that try to use this interface on XFS */ STATIC struct inode * xfs_fs_alloc_inode( struct super_block *sb) { - return kmem_zone_alloc(xfs_vnode_zone, KM_SLEEP); + BUG(); } +/* + * we need to provide an empty inode free function to prevent + * the generic code from trying to free ouuur combined inode. + */ STATIC void xfs_fs_destroy_inode( - struct inode *inode) -{ - kmem_zone_free(xfs_vnode_zone, inode); -} - -STATIC void -xfs_fs_inode_init_once( - void *vnode) + struct inode *inode) { - inode_init_once((struct inode *)vnode); + return; } - /* * Slab object creation initialisation for the XFS inode. * This covers only the idempotent fields in the XFS inode; @@ -897,13 +892,18 @@ xfs_fs_inode_init_once( * fields in the xfs inode that left in the initialise state * when freeing the inode. */ -void -xfs_inode_init_once( +STATIC void +xfs_fs_inode_init_once( void *inode) { struct xfs_inode *ip = inode; memset(ip, 0, sizeof(struct xfs_inode)); + + /* vfs inode */ + inode_init_once(VFS_I(ip)); + + /* xfs inode */ atomic_set(&ip->i_iocount, 0); atomic_set(&ip->i_pincount, 0); spin_lock_init(&ip->i_flags_lock); @@ -976,8 +976,6 @@ xfs_fs_clear_inode( if (xfs_reclaim(ip)) panic("%s: cannot reclaim 0x%p\n", __func__, inode); } - - ASSERT(XFS_I(inode) == NULL); } STATIC void @@ -1804,16 +1802,10 @@ xfs_free_trace_bufs(void) STATIC int __init xfs_init_zones(void) { - xfs_vnode_zone = kmem_zone_init_flags(sizeof(struct inode), "xfs_vnode", - KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | - KM_ZONE_SPREAD, - xfs_fs_inode_init_once); - if (!xfs_vnode_zone) - goto out; xfs_ioend_zone = kmem_zone_init(sizeof(xfs_ioend_t), "xfs_ioend"); if (!xfs_ioend_zone) - goto out_destroy_vnode_zone; + goto out; xfs_ioend_pool = mempool_create_slab_pool(4 * MAX_BUF_PER_PAGE, xfs_ioend_zone); @@ -1829,6 +1821,7 @@ xfs_init_zones(void) "xfs_bmap_free_item"); if (!xfs_bmap_free_item_zone) goto out_destroy_log_ticket_zone; + xfs_btree_cur_zone = kmem_zone_init(sizeof(xfs_btree_cur_t), "xfs_btree_cur"); if (!xfs_btree_cur_zone) @@ -1876,8 +1869,8 @@ xfs_init_zones(void) xfs_inode_zone = kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode", - KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | - KM_ZONE_SPREAD, xfs_inode_init_once); + KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD, + xfs_fs_inode_init_once); if (!xfs_inode_zone) goto out_destroy_efi_zone; @@ -1925,8 +1918,6 @@ xfs_init_zones(void) mempool_destroy(xfs_ioend_pool); out_destroy_ioend_zone: kmem_zone_destroy(xfs_ioend_zone); - out_destroy_vnode_zone: - kmem_zone_destroy(xfs_vnode_zone); out: return -ENOMEM; } @@ -1951,7 +1942,6 @@ xfs_destroy_zones(void) kmem_zone_destroy(xfs_log_ticket_zone); mempool_destroy(xfs_ioend_pool); kmem_zone_destroy(xfs_ioend_zone); - kmem_zone_destroy(xfs_vnode_zone); } diff --git a/fs/xfs/linux-2.6/xfs_vnode.c b/fs/xfs/linux-2.6/xfs_vnode.c index dceb6db..e5ec36e 100644 --- a/fs/xfs/linux-2.6/xfs_vnode.c +++ b/fs/xfs/linux-2.6/xfs_vnode.c @@ -90,11 +90,7 @@ vn_ioerror( */ static inline int xfs_icount(struct xfs_inode *ip) { - struct inode *inode = VFS_I(ip); - - if (!inode) - return atomic_read(&inode->i_count); - return -1; + return atomic_read(&VFS_I(ip)->i_count); } #define KTRACE_ENTER(ip, vk, s, line, ra) \ diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c index 12fe37e..533a375 100644 --- a/fs/xfs/xfs_iget.c +++ b/fs/xfs/xfs_iget.c @@ -44,77 +44,65 @@ */ static int xfs_iget_cache_hit( - struct inode *inode, struct xfs_perag *pag, struct xfs_inode *ip, int flags, int lock_flags) __releases(pag->pag_ici_lock) { struct xfs_mount *mp = ip->i_mount; - struct inode *old_inode; int error = 0; /* * If INEW is set this inode is being set up + * If IRECLAIM is set this inode is being torn down * Pause and try again. */ - if (xfs_iflags_test(ip, XFS_INEW)) { + if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) { error = EAGAIN; XFS_STATS_INC(xs_ig_frecycle); goto out_error; } - old_inode = ip->i_vnode; - if (old_inode == NULL) { + /* If IRECLAIMABLE is set, we've torn down the vfs inode part */ + if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) { + /* - * If IRECLAIM is set this inode is - * on its way out of the system, - * we need to pause and try again. + * If lookup is racing with unlink, then we should return an + * error immediately so we don't remove it from the reclaim + * list and potentially leak the inode. */ - if (xfs_iflags_test(ip, XFS_IRECLAIM)) { - error = EAGAIN; - XFS_STATS_INC(xs_ig_frecycle); + + if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) { + error = ENOENT; goto out_error; } - ASSERT(xfs_iflags_test(ip, XFS_IRECLAIMABLE)); + + xfs_itrace_exit_tag(ip, "xfs_iget.alloc"); /* - * If lookup is racing with unlink, then we - * should return an error immediately so we - * don't remove it from the reclaim list and - * potentially leak the inode. + * We need to re-initialise the VFS inode as it has been + * 'freed' by the VFS. Do this here so we can deal with + * errors cleanly, then tag it so it can be set up correctly + * later. */ - if ((ip->i_d.di_mode == 0) && - !(flags & XFS_IGET_CREATE)) { - error = ENOENT; + if (!inode_init_always(mp->m_super, VFS_I(ip))) { + error = ENOMEM; goto out_error; } - xfs_itrace_exit_tag(ip, "xfs_iget.alloc"); - + xfs_iflags_set(ip, XFS_INEW); xfs_iflags_clear(ip, XFS_IRECLAIMABLE); read_unlock(&pag->pag_ici_lock); XFS_MOUNT_ILOCK(mp); list_del_init(&ip->i_reclaim); XFS_MOUNT_IUNLOCK(mp); - - } else if (inode != old_inode) { - /* The inode is being torn down, pause and - * try again. - */ - if (old_inode->i_state & (I_FREEING | I_CLEAR)) { - error = EAGAIN; - XFS_STATS_INC(xs_ig_frecycle); - goto out_error; - } -/* Chances are the other vnode (the one in the inode) is being torn -* down right now, and we landed on top of it. Question is, what do -* we do? Unhook the old inode and hook up the new one? -*/ - cmn_err(CE_PANIC, - "xfs_iget_core: ambiguous vns: vp/0x%p, invp/0x%p", - old_inode, inode); + } else if (!igrab(VFS_I(ip))) { + /* If the VFS inode is being torn down, pause and try again. */ + error = EAGAIN; + XFS_STATS_INC(xs_ig_frecycle); + goto out_error; } else { + /* we've got a live one */ read_unlock(&pag->pag_ici_lock); } @@ -214,11 +202,11 @@ out_destroy: /* * Look up an inode by number in the given file system. * The inode is looked up in the cache held in each AG. - * If the inode is found in the cache, attach it to the provided - * vnode. + * If the inode is found in the cache, initialise the vfs inode + * if necessary. * * If it is not in core, read it in from the file system's device, - * add it to the cache and attach the provided vnode. + * add it to the cache and initialise the vfs inode. * * The inode is locked according to the value of the lock_flags parameter. * This flag parameter indicates how and if the inode's IO lock and inode lock @@ -235,9 +223,8 @@ out_destroy: * bno -- the block number starting the buffer containing the inode, * if known (as by bulkstat), else 0. */ -STATIC int -xfs_iget_core( - struct inode *inode, +int +xfs_iget( xfs_mount_t *mp, xfs_trans_t *tp, xfs_ino_t ino, @@ -268,7 +255,7 @@ again: ip = radix_tree_lookup(&pag->pag_ici_root, agino); if (ip) { - error = xfs_iget_cache_hit(inode, pag, ip, flags, lock_flags); + error = xfs_iget_cache_hit(pag, ip, flags, lock_flags); if (error) goto out_error_or_again; } else { @@ -282,23 +269,16 @@ again: } xfs_put_perag(mp, pag); - ASSERT(ip->i_df.if_ext_max == - XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t)); - xfs_iflags_set(ip, XFS_IMODIFIED); *ipp = ip; - /* - * Set up the Linux with the Linux inode. - */ - ip->i_vnode = inode; - inode->i_private = ip; - + ASSERT(ip->i_df.if_ext_max == + XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t)); /* * If we have a real type for an on-disk inode, we can set ops(&unlock) * now. If it's a new inode being created, xfs_ialloc will handle it. */ - if (ip->i_d.di_mode != 0) + if (xfs_iflags_test(ip, XFS_INEW) && ip->i_d.di_mode != 0) xfs_setup_inode(ip); return 0; @@ -313,75 +293,6 @@ out_error_or_again: /* - * The 'normal' internal xfs_iget, if needed it will - * 'allocate', or 'get', the vnode. - */ -int -xfs_iget( - xfs_mount_t *mp, - xfs_trans_t *tp, - xfs_ino_t ino, - uint flags, - uint lock_flags, - xfs_inode_t **ipp, - xfs_daddr_t bno) -{ - struct inode *inode; - xfs_inode_t *ip; - int error; - - XFS_STATS_INC(xs_ig_attempts); - -retry: - inode = iget_locked(mp->m_super, ino); - if (!inode) - /* If we got no inode we are out of memory */ - return ENOMEM; - - if (inode->i_state & I_NEW) { - XFS_STATS_INC(vn_active); - XFS_STATS_INC(vn_alloc); - - error = xfs_iget_core(inode, mp, tp, ino, flags, - lock_flags, ipp, bno); - if (error) { - make_bad_inode(inode); - if (inode->i_state & I_NEW) - unlock_new_inode(inode); - iput(inode); - } - return error; - } - - /* - * If the inode is not fully constructed due to - * filehandle mismatches wait for the inode to go - * away and try again. - * - * iget_locked will call __wait_on_freeing_inode - * to wait for the inode to go away. - */ - if (is_bad_inode(inode)) { - iput(inode); - delay(1); - goto retry; - } - - ip = XFS_I(inode); - if (!ip) { - iput(inode); - delay(1); - goto retry; - } - - if (lock_flags != 0) - xfs_ilock(ip, lock_flags); - XFS_STATS_INC(xs_ig_found); - *ipp = ip; - return 0; -} - -/* * Look for the inode corresponding to the given ino in the hash table. * If it is there and its i_transp pointer matches tp, return it. * Otherwise, return NULL. @@ -481,14 +392,6 @@ xfs_ireclaim(xfs_inode_t *ip) XFS_QM_DQDETACH(ip->i_mount, ip); /* - * Pull our behavior descriptor from the vnode chain. - */ - if (ip->i_vnode) { - ip->i_vnode->i_private = NULL; - ip->i_vnode = NULL; - } - - /* * Free all memory associated with the inode. */ xfs_iunlock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 7c9d0a7..568bc8e 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -812,6 +812,16 @@ xfs_inode_alloc( ASSERT(list_empty(&ip->i_reclaim)); ASSERT(completion_done(&ip->i_flush)); + /* + * initialise the VFS inode here to get failures + * out of the way early. + */ + if (!inode_init_always(mp->m_super, VFS_I(ip))) { + kmem_zone_free(xfs_inode_zone, ip); + return NULL; + } + + /* initialise the xfs inode */ ip->i_ino = ino; ip->i_mount = mp; ip->i_blkno = 0; @@ -1086,6 +1096,7 @@ xfs_ialloc( uint flags; int error; timespec_t tv; + int filestreams = 0; /* * Call the space management code to pick @@ -1093,9 +1104,8 @@ xfs_ialloc( */ error = xfs_dialloc(tp, pip ? pip->i_ino : 0, mode, okalloc, ialloc_context, call_again, &ino); - if (error != 0) { + if (error) return error; - } if (*call_again || ino == NULLFSINO) { *ipp = NULL; return 0; @@ -1109,9 +1119,8 @@ xfs_ialloc( */ error = xfs_trans_iget(tp->t_mountp, tp, ino, XFS_IGET_CREATE, XFS_ILOCK_EXCL, &ip); - if (error != 0) { + if (error) return error; - } ASSERT(ip != NULL); ip->i_d.di_mode = (__uint16_t)mode; @@ -1192,13 +1201,12 @@ xfs_ialloc( flags |= XFS_ILOG_DEV; break; case S_IFREG: - if (pip && xfs_inode_is_filestream(pip)) { - error = xfs_filestream_associate(pip, ip); - if (error < 0) - return -error; - if (!error) - xfs_iflags_set(ip, XFS_IFILESTREAM); - } + /* + * we can't set up filestreams until after the VFS inode + * is set up properly. + */ + if (pip && xfs_inode_is_filestream(pip)) + filestreams = 1; /* fall through */ case S_IFDIR: if (pip && (pip->i_d.di_flags & XFS_DIFLAG_ANY)) { @@ -1264,6 +1272,15 @@ xfs_ialloc( /* now that we have an i_mode we can setup inode ops and unlock */ xfs_setup_inode(ip); + /* now we have set up the vfs inode we can associate the filestream */ + if (filestreams) { + error = xfs_filestream_associate(pip, ip); + if (error < 0) + return -error; + if (!error) + xfs_iflags_set(ip, XFS_IFILESTREAM); + } + *ipp = ip; return 0; } @@ -2654,6 +2671,10 @@ xfs_idestroy_fork( * It must free the inode itself and any buffers allocated for * if_extents/if_data and if_broot. It must also free the lock * associated with the inode. + * + * Note: because we don't initialise everything on reallocation out + * of the zone, we must ensure we nullify everything correctly before + * freeing the structure. */ void xfs_idestroy( diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 813c9b0..9c788ab 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -195,7 +195,6 @@ typedef struct xfs_inode { /* Inode linking and identification information. */ struct xfs_mount *i_mount; /* fs mount struct ptr */ struct list_head i_reclaim; /* reclaim list */ - struct inode *i_vnode; /* vnode backpointer */ struct xfs_dquot *i_udquot; /* user dquot */ struct xfs_dquot *i_gdquot; /* group dquot */ @@ -229,6 +228,10 @@ typedef struct xfs_inode { xfs_fsize_t i_size; /* in-memory size */ xfs_fsize_t i_new_size; /* size when write completes */ atomic_t i_iocount; /* outstanding I/O count */ + + /* VFS inode */ + struct inode i_vnode; /* embedded VFS inode */ + /* Trace buffers per inode. */ #ifdef XFS_INODE_TRACE struct ktrace *i_trace; /* general inode trace */ @@ -256,13 +259,13 @@ typedef struct xfs_inode { /* Convert from vfs inode to xfs inode */ static inline struct xfs_inode *XFS_I(struct inode *inode) { - return (struct xfs_inode *)inode->i_private; + return container_of(inode, struct xfs_inode, i_vnode); } /* convert from xfs inode to vfs inode */ static inline struct inode *VFS_I(struct xfs_inode *ip) { - return (struct inode *)ip->i_vnode; + return &ip->i_vnode; } /* diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c index 6bce4ad..8898985 100644 --- a/fs/xfs/xfs_vnodeops.c +++ b/fs/xfs/xfs_vnodeops.c @@ -2800,6 +2800,7 @@ xfs_reclaim( if (!ip->i_update_core && (ip->i_itemp == NULL)) { xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_iflock(ip); + xfs_iflags_set(ip, XFS_IRECLAIMABLE); return xfs_finish_reclaim(ip, 1, XFS_IFLUSH_DELWRI_ELSE_SYNC); } else { xfs_mount_t *mp = ip->i_mount; @@ -2808,8 +2809,6 @@ xfs_reclaim( XFS_MOUNT_ILOCK(mp); spin_lock(&ip->i_flags_lock); __xfs_iflags_set(ip, XFS_IRECLAIMABLE); - VFS_I(ip)->i_private = NULL; - ip->i_vnode = NULL; spin_unlock(&ip->i_flags_lock); list_add_tail(&ip->i_reclaim, &mp->m_del_inodes); XFS_MOUNT_IUNLOCK(mp); @@ -2824,10 +2823,6 @@ xfs_finish_reclaim( int sync_mode) { xfs_perag_t *pag = xfs_get_perag(ip->i_mount, ip->i_ino); - struct inode *vp = VFS_I(ip); - - if (vp && VN_BAD(vp)) - goto reclaim; /* The hash lock here protects a thread in xfs_iget_core from * racing with us on linking the inode back with a vnode. @@ -2837,7 +2832,7 @@ xfs_finish_reclaim( write_lock(&pag->pag_ici_lock); spin_lock(&ip->i_flags_lock); if (__xfs_iflags_test(ip, XFS_IRECLAIM) || - (!__xfs_iflags_test(ip, XFS_IRECLAIMABLE) && vp == NULL)) { + !__xfs_iflags_test(ip, XFS_IRECLAIMABLE)) { spin_unlock(&ip->i_flags_lock); write_unlock(&pag->pag_ici_lock); if (locked) { @@ -2871,15 +2866,13 @@ xfs_finish_reclaim( * In the case of a forced shutdown we rely on xfs_iflush() to * wait for the inode to be unpinned before returning an error. */ - if (xfs_iflush(ip, sync_mode) == 0) { + if (!VN_BAD(VFS_I(ip)) && xfs_iflush(ip, sync_mode) == 0) { /* synchronize with xfs_iflush_done */ xfs_iflock(ip); xfs_ifunlock(ip); } xfs_iunlock(ip, XFS_ILOCK_EXCL); - - reclaim: xfs_ireclaim(ip); return 0; } -- 1.5.6 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] XFS: Combine the XFS and Linux inodes V2 2008-09-13 14:10 ` [PATCH 3/3] XFS: Combine the XFS and Linux inodes V2 Dave Chinner @ 2008-09-14 13:31 ` Christoph Hellwig 2008-09-16 4:47 ` Dave Chinner 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2008-09-14 13:31 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, linux-fsdevel On Sun, Sep 14, 2008 at 12:10:27AM +1000, Dave Chinner wrote: > + * we need to provide an empty inode free function to prevent > + * the generic code from trying to free ouuur combined inode. ^^^^ spelling? > - > -STATIC void > -xfs_fs_inode_init_once( > - void *vnode) > + struct inode *inode) > { > - inode_init_once((struct inode *)vnode); > + return; > } No need for a no-argument return at the end of the function. > static inline int xfs_icount(struct xfs_inode *ip) > { > - struct inode *inode = VFS_I(ip); > - > - if (!inode) > - return atomic_read(&inode->i_count); > - return -1; > + return atomic_read(&VFS_I(ip)->i_count); > } At this point we can just kill this helper - there's only one caller anyway. > - if (xfs_iflush(ip, sync_mode) == 0) { > + if (!VN_BAD(VFS_I(ip)) && xfs_iflush(ip, sync_mode) == 0) { Why don't you switch to is_bad_inode directly instead of fixing this up in a later patch? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] XFS: Combine the XFS and Linux inodes V2 2008-09-14 13:31 ` Christoph Hellwig @ 2008-09-16 4:47 ` Dave Chinner 0 siblings, 0 replies; 6+ messages in thread From: Dave Chinner @ 2008-09-16 4:47 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs, linux-fsdevel On Sun, Sep 14, 2008 at 09:31:28AM -0400, Christoph Hellwig wrote: > On Sun, Sep 14, 2008 at 12:10:27AM +1000, Dave Chinner wrote: > > + * we need to provide an empty inode free function to prevent > > + * the generic code from trying to free ouuur combined inode. > > ^^^^ spelling? Damn keeey repeat.... ;) > > -STATIC void > > -xfs_fs_inode_init_once( > > - void *vnode) > > + struct inode *inode) > > { > > - inode_init_once((struct inode *)vnode); > > + return; > > } > > No need for a no-argument return at the end of the function. Ok. Will fix. > > static inline int xfs_icount(struct xfs_inode *ip) > > { > > - struct inode *inode = VFS_I(ip); > > - > > - if (!inode) > > - return atomic_read(&inode->i_count); > > - return -1; > > + return atomic_read(&VFS_I(ip)->i_count); > > } > > At this point we can just kill this helper - there's only one caller > anyway. Killed. > > > - if (xfs_iflush(ip, sync_mode) == 0) { > > + if (!VN_BAD(VFS_I(ip)) && xfs_iflush(ip, sync_mode) == 0) { > > Why don't you switch to is_bad_inode directly instead of fixing this > up in a later patch? Fixed. Updated patch below. Cheers, Dave. -- Dave Chinner david@fromorbit.com XFS: Combine the XFS and Linux inodes V3 To avoid issues with different lifecycles of XFS and Linux inodes, embedd the linux inode inside the XFS inode. This means that the linux inode has the same lifecycle as the XFS inode, even when it has been released by the OS. XFS inodes don't live much longer than this (a short stint in reclaim at most), so there isn't significant memory usage penalties here. Version 3 o kill xfs_icount() Version 2 o remove unused commented out code from xfs_iget(). o kill useless cast in VFS_I() --- fs/xfs/linux-2.6/xfs_iops.c | 17 +++- fs/xfs/linux-2.6/xfs_super.c | 48 +++++-------- fs/xfs/linux-2.6/xfs_vnode.c | 15 +---- fs/xfs/xfs_iget.c | 167 +++++++++--------------------------------- fs/xfs/xfs_inode.c | 43 ++++++++--- fs/xfs/xfs_inode.h | 9 ++- fs/xfs/xfs_vnodeops.c | 13 +--- 7 files changed, 108 insertions(+), 204 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c index 3bfb3c0..37bb101 100644 --- a/fs/xfs/linux-2.6/xfs_iops.c +++ b/fs/xfs/linux-2.6/xfs_iops.c @@ -64,14 +64,14 @@ xfs_synchronize_atime( { struct inode *inode = VFS_I(ip); - if (inode) { + if (!(inode->i_state & I_CLEAR)) { ip->i_d.di_atime.t_sec = (__int32_t)inode->i_atime.tv_sec; ip->i_d.di_atime.t_nsec = (__int32_t)inode->i_atime.tv_nsec; } } /* - * If the linux inode exists, mark it dirty. + * If the linux inode is valid, mark it dirty. * Used when commiting a dirty inode into a transaction so that * the inode will get written back by the linux code */ @@ -81,7 +81,7 @@ xfs_mark_inode_dirty_sync( { struct inode *inode = VFS_I(ip); - if (inode) + if (!(inode->i_state & (I_WILL_FREE|I_FREEING|I_CLEAR))) mark_inode_dirty_sync(inode); } @@ -766,12 +766,21 @@ xfs_diflags_to_iflags( * When reading existing inodes from disk this is called directly * from xfs_iget, when creating a new inode it is called from * xfs_ialloc after setting up the inode. + * + * We are always called with an uninitialised linux inode here. + * We need to initialise the necessary fields and take a reference + * on it. */ void xfs_setup_inode( struct xfs_inode *ip) { - struct inode *inode = ip->i_vnode; + struct inode *inode = &ip->i_vnode; + + inode->i_ino = ip->i_ino; + inode->i_state = I_NEW|I_LOCK; + inode_add_to_lists(ip->i_mount->m_super, inode); + ASSERT(atomic_read(&inode->i_count) == 1); inode->i_mode = ip->i_d.di_mode; inode->i_nlink = ip->i_d.di_nlink; diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c index 8db6806..2b04415 100644 --- a/fs/xfs/linux-2.6/xfs_super.c +++ b/fs/xfs/linux-2.6/xfs_super.c @@ -71,7 +71,6 @@ static struct quotactl_ops xfs_quotactl_operations; static struct super_operations xfs_super_operations; -static kmem_zone_t *xfs_vnode_zone; static kmem_zone_t *xfs_ioend_zone; mempool_t *xfs_ioend_pool; @@ -866,29 +865,24 @@ xfsaild_stop( } - +/* Catch misguided souls that try to use this interface on XFS */ STATIC struct inode * xfs_fs_alloc_inode( struct super_block *sb) { - return kmem_zone_alloc(xfs_vnode_zone, KM_SLEEP); + BUG(); } +/* + * we need to provide an empty inode free function to prevent + * the generic code from trying to free our combined inode. + */ STATIC void xfs_fs_destroy_inode( - struct inode *inode) -{ - kmem_zone_free(xfs_vnode_zone, inode); -} - -STATIC void -xfs_fs_inode_init_once( - void *vnode) + struct inode *inode) { - inode_init_once((struct inode *)vnode); } - /* * Slab object creation initialisation for the XFS inode. * This covers only the idempotent fields in the XFS inode; @@ -897,13 +891,18 @@ xfs_fs_inode_init_once( * fields in the xfs inode that left in the initialise state * when freeing the inode. */ -void -xfs_inode_init_once( +STATIC void +xfs_fs_inode_init_once( void *inode) { struct xfs_inode *ip = inode; memset(ip, 0, sizeof(struct xfs_inode)); + + /* vfs inode */ + inode_init_once(VFS_I(ip)); + + /* xfs inode */ atomic_set(&ip->i_iocount, 0); atomic_set(&ip->i_pincount, 0); spin_lock_init(&ip->i_flags_lock); @@ -921,7 +920,6 @@ xfs_inode_init_once( */ init_completion(&ip->i_flush); complete(&ip->i_flush); - } /* @@ -976,8 +974,6 @@ xfs_fs_clear_inode( if (xfs_reclaim(ip)) panic("%s: cannot reclaim 0x%p\n", __func__, inode); } - - ASSERT(XFS_I(inode) == NULL); } STATIC void @@ -1804,16 +1800,10 @@ xfs_free_trace_bufs(void) STATIC int __init xfs_init_zones(void) { - xfs_vnode_zone = kmem_zone_init_flags(sizeof(struct inode), "xfs_vnode", - KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | - KM_ZONE_SPREAD, - xfs_fs_inode_init_once); - if (!xfs_vnode_zone) - goto out; xfs_ioend_zone = kmem_zone_init(sizeof(xfs_ioend_t), "xfs_ioend"); if (!xfs_ioend_zone) - goto out_destroy_vnode_zone; + goto out; xfs_ioend_pool = mempool_create_slab_pool(4 * MAX_BUF_PER_PAGE, xfs_ioend_zone); @@ -1829,6 +1819,7 @@ xfs_init_zones(void) "xfs_bmap_free_item"); if (!xfs_bmap_free_item_zone) goto out_destroy_log_ticket_zone; + xfs_btree_cur_zone = kmem_zone_init(sizeof(xfs_btree_cur_t), "xfs_btree_cur"); if (!xfs_btree_cur_zone) @@ -1876,8 +1867,8 @@ xfs_init_zones(void) xfs_inode_zone = kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode", - KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | - KM_ZONE_SPREAD, xfs_inode_init_once); + KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD, + xfs_fs_inode_init_once); if (!xfs_inode_zone) goto out_destroy_efi_zone; @@ -1925,8 +1916,6 @@ xfs_init_zones(void) mempool_destroy(xfs_ioend_pool); out_destroy_ioend_zone: kmem_zone_destroy(xfs_ioend_zone); - out_destroy_vnode_zone: - kmem_zone_destroy(xfs_vnode_zone); out: return -ENOMEM; } @@ -1951,7 +1940,6 @@ xfs_destroy_zones(void) kmem_zone_destroy(xfs_log_ticket_zone); mempool_destroy(xfs_ioend_pool); kmem_zone_destroy(xfs_ioend_zone); - kmem_zone_destroy(xfs_vnode_zone); } diff --git a/fs/xfs/linux-2.6/xfs_vnode.c b/fs/xfs/linux-2.6/xfs_vnode.c index dceb6db..933f89a 100644 --- a/fs/xfs/linux-2.6/xfs_vnode.c +++ b/fs/xfs/linux-2.6/xfs_vnode.c @@ -84,25 +84,12 @@ vn_ioerror( #ifdef XFS_INODE_TRACE -/* - * Reference count of Linux inode if present, -1 if the xfs_inode - * has no associated Linux inode. - */ -static inline int xfs_icount(struct xfs_inode *ip) -{ - struct inode *inode = VFS_I(ip); - - if (!inode) - return atomic_read(&inode->i_count); - return -1; -} - #define KTRACE_ENTER(ip, vk, s, line, ra) \ ktrace_enter( (ip)->i_trace, \ /* 0 */ (void *)(__psint_t)(vk), \ /* 1 */ (void *)(s), \ /* 2 */ (void *)(__psint_t) line, \ -/* 3 */ (void *)(__psint_t)xfs_icount(ip), \ +/* 3 */ (void *)(__psint_t)atomic_read(&VFS_I(ip)->i_count) \ /* 4 */ (void *)(ra), \ /* 5 */ NULL, \ /* 6 */ (void *)(__psint_t)current_cpu(), \ diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c index 12fe37e..533a375 100644 --- a/fs/xfs/xfs_iget.c +++ b/fs/xfs/xfs_iget.c @@ -44,77 +44,65 @@ */ static int xfs_iget_cache_hit( - struct inode *inode, struct xfs_perag *pag, struct xfs_inode *ip, int flags, int lock_flags) __releases(pag->pag_ici_lock) { struct xfs_mount *mp = ip->i_mount; - struct inode *old_inode; int error = 0; /* * If INEW is set this inode is being set up + * If IRECLAIM is set this inode is being torn down * Pause and try again. */ - if (xfs_iflags_test(ip, XFS_INEW)) { + if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) { error = EAGAIN; XFS_STATS_INC(xs_ig_frecycle); goto out_error; } - old_inode = ip->i_vnode; - if (old_inode == NULL) { + /* If IRECLAIMABLE is set, we've torn down the vfs inode part */ + if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) { + /* - * If IRECLAIM is set this inode is - * on its way out of the system, - * we need to pause and try again. + * If lookup is racing with unlink, then we should return an + * error immediately so we don't remove it from the reclaim + * list and potentially leak the inode. */ - if (xfs_iflags_test(ip, XFS_IRECLAIM)) { - error = EAGAIN; - XFS_STATS_INC(xs_ig_frecycle); + + if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) { + error = ENOENT; goto out_error; } - ASSERT(xfs_iflags_test(ip, XFS_IRECLAIMABLE)); + + xfs_itrace_exit_tag(ip, "xfs_iget.alloc"); /* - * If lookup is racing with unlink, then we - * should return an error immediately so we - * don't remove it from the reclaim list and - * potentially leak the inode. + * We need to re-initialise the VFS inode as it has been + * 'freed' by the VFS. Do this here so we can deal with + * errors cleanly, then tag it so it can be set up correctly + * later. */ - if ((ip->i_d.di_mode == 0) && - !(flags & XFS_IGET_CREATE)) { - error = ENOENT; + if (!inode_init_always(mp->m_super, VFS_I(ip))) { + error = ENOMEM; goto out_error; } - xfs_itrace_exit_tag(ip, "xfs_iget.alloc"); - + xfs_iflags_set(ip, XFS_INEW); xfs_iflags_clear(ip, XFS_IRECLAIMABLE); read_unlock(&pag->pag_ici_lock); XFS_MOUNT_ILOCK(mp); list_del_init(&ip->i_reclaim); XFS_MOUNT_IUNLOCK(mp); - - } else if (inode != old_inode) { - /* The inode is being torn down, pause and - * try again. - */ - if (old_inode->i_state & (I_FREEING | I_CLEAR)) { - error = EAGAIN; - XFS_STATS_INC(xs_ig_frecycle); - goto out_error; - } -/* Chances are the other vnode (the one in the inode) is being torn -* down right now, and we landed on top of it. Question is, what do -* we do? Unhook the old inode and hook up the new one? -*/ - cmn_err(CE_PANIC, - "xfs_iget_core: ambiguous vns: vp/0x%p, invp/0x%p", - old_inode, inode); + } else if (!igrab(VFS_I(ip))) { + /* If the VFS inode is being torn down, pause and try again. */ + error = EAGAIN; + XFS_STATS_INC(xs_ig_frecycle); + goto out_error; } else { + /* we've got a live one */ read_unlock(&pag->pag_ici_lock); } @@ -214,11 +202,11 @@ out_destroy: /* * Look up an inode by number in the given file system. * The inode is looked up in the cache held in each AG. - * If the inode is found in the cache, attach it to the provided - * vnode. + * If the inode is found in the cache, initialise the vfs inode + * if necessary. * * If it is not in core, read it in from the file system's device, - * add it to the cache and attach the provided vnode. + * add it to the cache and initialise the vfs inode. * * The inode is locked according to the value of the lock_flags parameter. * This flag parameter indicates how and if the inode's IO lock and inode lock @@ -235,9 +223,8 @@ out_destroy: * bno -- the block number starting the buffer containing the inode, * if known (as by bulkstat), else 0. */ -STATIC int -xfs_iget_core( - struct inode *inode, +int +xfs_iget( xfs_mount_t *mp, xfs_trans_t *tp, xfs_ino_t ino, @@ -268,7 +255,7 @@ again: ip = radix_tree_lookup(&pag->pag_ici_root, agino); if (ip) { - error = xfs_iget_cache_hit(inode, pag, ip, flags, lock_flags); + error = xfs_iget_cache_hit(pag, ip, flags, lock_flags); if (error) goto out_error_or_again; } else { @@ -282,23 +269,16 @@ again: } xfs_put_perag(mp, pag); - ASSERT(ip->i_df.if_ext_max == - XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t)); - xfs_iflags_set(ip, XFS_IMODIFIED); *ipp = ip; - /* - * Set up the Linux with the Linux inode. - */ - ip->i_vnode = inode; - inode->i_private = ip; - + ASSERT(ip->i_df.if_ext_max == + XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t)); /* * If we have a real type for an on-disk inode, we can set ops(&unlock) * now. If it's a new inode being created, xfs_ialloc will handle it. */ - if (ip->i_d.di_mode != 0) + if (xfs_iflags_test(ip, XFS_INEW) && ip->i_d.di_mode != 0) xfs_setup_inode(ip); return 0; @@ -313,75 +293,6 @@ out_error_or_again: /* - * The 'normal' internal xfs_iget, if needed it will - * 'allocate', or 'get', the vnode. - */ -int -xfs_iget( - xfs_mount_t *mp, - xfs_trans_t *tp, - xfs_ino_t ino, - uint flags, - uint lock_flags, - xfs_inode_t **ipp, - xfs_daddr_t bno) -{ - struct inode *inode; - xfs_inode_t *ip; - int error; - - XFS_STATS_INC(xs_ig_attempts); - -retry: - inode = iget_locked(mp->m_super, ino); - if (!inode) - /* If we got no inode we are out of memory */ - return ENOMEM; - - if (inode->i_state & I_NEW) { - XFS_STATS_INC(vn_active); - XFS_STATS_INC(vn_alloc); - - error = xfs_iget_core(inode, mp, tp, ino, flags, - lock_flags, ipp, bno); - if (error) { - make_bad_inode(inode); - if (inode->i_state & I_NEW) - unlock_new_inode(inode); - iput(inode); - } - return error; - } - - /* - * If the inode is not fully constructed due to - * filehandle mismatches wait for the inode to go - * away and try again. - * - * iget_locked will call __wait_on_freeing_inode - * to wait for the inode to go away. - */ - if (is_bad_inode(inode)) { - iput(inode); - delay(1); - goto retry; - } - - ip = XFS_I(inode); - if (!ip) { - iput(inode); - delay(1); - goto retry; - } - - if (lock_flags != 0) - xfs_ilock(ip, lock_flags); - XFS_STATS_INC(xs_ig_found); - *ipp = ip; - return 0; -} - -/* * Look for the inode corresponding to the given ino in the hash table. * If it is there and its i_transp pointer matches tp, return it. * Otherwise, return NULL. @@ -481,14 +392,6 @@ xfs_ireclaim(xfs_inode_t *ip) XFS_QM_DQDETACH(ip->i_mount, ip); /* - * Pull our behavior descriptor from the vnode chain. - */ - if (ip->i_vnode) { - ip->i_vnode->i_private = NULL; - ip->i_vnode = NULL; - } - - /* * Free all memory associated with the inode. */ xfs_iunlock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 7c9d0a7..568bc8e 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -812,6 +812,16 @@ xfs_inode_alloc( ASSERT(list_empty(&ip->i_reclaim)); ASSERT(completion_done(&ip->i_flush)); + /* + * initialise the VFS inode here to get failures + * out of the way early. + */ + if (!inode_init_always(mp->m_super, VFS_I(ip))) { + kmem_zone_free(xfs_inode_zone, ip); + return NULL; + } + + /* initialise the xfs inode */ ip->i_ino = ino; ip->i_mount = mp; ip->i_blkno = 0; @@ -1086,6 +1096,7 @@ xfs_ialloc( uint flags; int error; timespec_t tv; + int filestreams = 0; /* * Call the space management code to pick @@ -1093,9 +1104,8 @@ xfs_ialloc( */ error = xfs_dialloc(tp, pip ? pip->i_ino : 0, mode, okalloc, ialloc_context, call_again, &ino); - if (error != 0) { + if (error) return error; - } if (*call_again || ino == NULLFSINO) { *ipp = NULL; return 0; @@ -1109,9 +1119,8 @@ xfs_ialloc( */ error = xfs_trans_iget(tp->t_mountp, tp, ino, XFS_IGET_CREATE, XFS_ILOCK_EXCL, &ip); - if (error != 0) { + if (error) return error; - } ASSERT(ip != NULL); ip->i_d.di_mode = (__uint16_t)mode; @@ -1192,13 +1201,12 @@ xfs_ialloc( flags |= XFS_ILOG_DEV; break; case S_IFREG: - if (pip && xfs_inode_is_filestream(pip)) { - error = xfs_filestream_associate(pip, ip); - if (error < 0) - return -error; - if (!error) - xfs_iflags_set(ip, XFS_IFILESTREAM); - } + /* + * we can't set up filestreams until after the VFS inode + * is set up properly. + */ + if (pip && xfs_inode_is_filestream(pip)) + filestreams = 1; /* fall through */ case S_IFDIR: if (pip && (pip->i_d.di_flags & XFS_DIFLAG_ANY)) { @@ -1264,6 +1272,15 @@ xfs_ialloc( /* now that we have an i_mode we can setup inode ops and unlock */ xfs_setup_inode(ip); + /* now we have set up the vfs inode we can associate the filestream */ + if (filestreams) { + error = xfs_filestream_associate(pip, ip); + if (error < 0) + return -error; + if (!error) + xfs_iflags_set(ip, XFS_IFILESTREAM); + } + *ipp = ip; return 0; } @@ -2654,6 +2671,10 @@ xfs_idestroy_fork( * It must free the inode itself and any buffers allocated for * if_extents/if_data and if_broot. It must also free the lock * associated with the inode. + * + * Note: because we don't initialise everything on reallocation out + * of the zone, we must ensure we nullify everything correctly before + * freeing the structure. */ void xfs_idestroy( diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 813c9b0..9c788ab 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -195,7 +195,6 @@ typedef struct xfs_inode { /* Inode linking and identification information. */ struct xfs_mount *i_mount; /* fs mount struct ptr */ struct list_head i_reclaim; /* reclaim list */ - struct inode *i_vnode; /* vnode backpointer */ struct xfs_dquot *i_udquot; /* user dquot */ struct xfs_dquot *i_gdquot; /* group dquot */ @@ -229,6 +228,10 @@ typedef struct xfs_inode { xfs_fsize_t i_size; /* in-memory size */ xfs_fsize_t i_new_size; /* size when write completes */ atomic_t i_iocount; /* outstanding I/O count */ + + /* VFS inode */ + struct inode i_vnode; /* embedded VFS inode */ + /* Trace buffers per inode. */ #ifdef XFS_INODE_TRACE struct ktrace *i_trace; /* general inode trace */ @@ -256,13 +259,13 @@ typedef struct xfs_inode { /* Convert from vfs inode to xfs inode */ static inline struct xfs_inode *XFS_I(struct inode *inode) { - return (struct xfs_inode *)inode->i_private; + return container_of(inode, struct xfs_inode, i_vnode); } /* convert from xfs inode to vfs inode */ static inline struct inode *VFS_I(struct xfs_inode *ip) { - return (struct inode *)ip->i_vnode; + return &ip->i_vnode; } /* diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c index 6bce4ad..c595432 100644 --- a/fs/xfs/xfs_vnodeops.c +++ b/fs/xfs/xfs_vnodeops.c @@ -2800,6 +2800,7 @@ xfs_reclaim( if (!ip->i_update_core && (ip->i_itemp == NULL)) { xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_iflock(ip); + xfs_iflags_set(ip, XFS_IRECLAIMABLE); return xfs_finish_reclaim(ip, 1, XFS_IFLUSH_DELWRI_ELSE_SYNC); } else { xfs_mount_t *mp = ip->i_mount; @@ -2808,8 +2809,6 @@ xfs_reclaim( XFS_MOUNT_ILOCK(mp); spin_lock(&ip->i_flags_lock); __xfs_iflags_set(ip, XFS_IRECLAIMABLE); - VFS_I(ip)->i_private = NULL; - ip->i_vnode = NULL; spin_unlock(&ip->i_flags_lock); list_add_tail(&ip->i_reclaim, &mp->m_del_inodes); XFS_MOUNT_IUNLOCK(mp); @@ -2824,10 +2823,6 @@ xfs_finish_reclaim( int sync_mode) { xfs_perag_t *pag = xfs_get_perag(ip->i_mount, ip->i_ino); - struct inode *vp = VFS_I(ip); - - if (vp && VN_BAD(vp)) - goto reclaim; /* The hash lock here protects a thread in xfs_iget_core from * racing with us on linking the inode back with a vnode. @@ -2837,7 +2832,7 @@ xfs_finish_reclaim( write_lock(&pag->pag_ici_lock); spin_lock(&ip->i_flags_lock); if (__xfs_iflags_test(ip, XFS_IRECLAIM) || - (!__xfs_iflags_test(ip, XFS_IRECLAIMABLE) && vp == NULL)) { + !__xfs_iflags_test(ip, XFS_IRECLAIMABLE)) { spin_unlock(&ip->i_flags_lock); write_unlock(&pag->pag_ici_lock); if (locked) { @@ -2871,15 +2866,13 @@ xfs_finish_reclaim( * In the case of a forced shutdown we rely on xfs_iflush() to * wait for the inode to be unpinned before returning an error. */ - if (xfs_iflush(ip, sync_mode) == 0) { + if (!is_bad_inode(VFS_I(ip)) && xfs_iflush(ip, sync_mode) == 0) { /* synchronize with xfs_iflush_done */ xfs_iflock(ip); xfs_ifunlock(ip); } xfs_iunlock(ip, XFS_ILOCK_EXCL); - - reclaim: xfs_ireclaim(ip); return 0; } ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-09-16 4:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-09-13 14:10 [PATCH 0/3] XFS: Combine Linux and XFS inodes Dave Chinner 2008-09-13 14:10 ` [PATCH 1/3] Inode: Allow external initialisers Dave Chinner 2008-09-13 14:10 ` [PATCH 2/3] Inode: Allow external list initialisation Dave Chinner 2008-09-13 14:10 ` [PATCH 3/3] XFS: Combine the XFS and Linux inodes V2 Dave Chinner 2008-09-14 13:31 ` Christoph Hellwig 2008-09-16 4:47 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).