[PATCH, RFC] prune back iprune

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH, RFC] prune back iprune_sem
@ 2010-11-02 18:45 Christoph Hellwig
  2010-11-04 23:32 ` Jan Kara
  2011-02-15 10:29 ` Christoph Hellwig
  0 siblings, 2 replies; 5+ messages in thread
From: Christoph Hellwig @ 2010-11-02 18:45 UTC (permalink / raw)
  To: viro, akpm; +Cc: linux-fsdevel

iprune_sem is continously giving us lockdep warnings because we do take it in
read mode in the reclaim path, but we're also doing non-NOFS allocations under
it taken in write mode.

Taking a bit deeper look at it I think it's fixable quite trivially:

 - for invalidate_inodes we do not need iprune_sem at all.  We have an active
   reference on the superblock, so the filesystem is not going away until it
   has finished.
 - for evict_inodes we do need it, to make sure prune_icache has done it's
   work before we tear down the superblock.  But there is no reason to
   hold it over the actual reclaim operation - it's enough to cycle through
   it after the actual reclaim to make sure we wait for any pending
   prune_icache to complete.

Signed-off-by: Christoph Hellwig <hch@lst.de>

diff --git a/fs/inode.c b/fs/inode.c
index ae2727a..cfa7722 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -492,8 +492,6 @@ void evict_inodes(struct super_block *sb)
 	struct inode *inode, *next;
 	LIST_HEAD(dispose);
 
-	down_write(&iprune_sem);
-
 	spin_lock(&inode_lock);
 	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
 		if (atomic_read(&inode->i_count))
@@ -518,6 +516,13 @@ void evict_inodes(struct super_block *sb)
 	spin_unlock(&inode_lock);
 
 	dispose_list(&dispose);
+
+	/*
+	 * Cycle through iprune_sem to make sure any inode that prune_icache
+	 * moved off the list before we took the lock has been fully torn
+	 * down.
+	 */
+	down_write(&iprune_sem);
 	up_write(&iprune_sem);
 }
 
@@ -534,8 +539,6 @@ int invalidate_inodes(struct super_block *sb)
 	struct inode *inode, *next;
 	LIST_HEAD(dispose);
 
-	down_write(&iprune_sem);
-
 	spin_lock(&inode_lock);
 	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
 		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))
@@ -559,7 +562,6 @@ int invalidate_inodes(struct super_block *sb)
 	spin_unlock(&inode_lock);
 
 	dispose_list(&dispose);
-	up_write(&iprune_sem);
 
 	return busy;
 }

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH, RFC] prune back iprune_sem
  2010-11-02 18:45 [PATCH, RFC] prune back iprune_sem Christoph Hellwig
@ 2010-11-04 23:32 ` Jan Kara
  2011-02-15 10:29 ` Christoph Hellwig
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Kara @ 2010-11-04 23:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: viro, akpm, linux-fsdevel

On Tue 02-11-10 19:45:36, Christoph Hellwig wrote:
> iprune_sem is continously giving us lockdep warnings because we do take it in
> read mode in the reclaim path, but we're also doing non-NOFS allocations under
> it taken in write mode.
> 
> Taking a bit deeper look at it I think it's fixable quite trivially:
> 
>  - for invalidate_inodes we do not need iprune_sem at all.  We have an active
>    reference on the superblock, so the filesystem is not going away until it
>    has finished.
>  - for evict_inodes we do need it, to make sure prune_icache has done it's
>    work before we tear down the superblock.  But there is no reason to
>    hold it over the actual reclaim operation - it's enough to cycle through
>    it after the actual reclaim to make sure we wait for any pending
>    prune_icache to complete.
  The patch is OK but it's kind of subtle that evict_inodes() can now skip
some inode in the LRU list because prune_icache() is just processing it
and so it has elevated i_count. Everything will work out fine because
MS_ACTIVE is cleared and thus iput() will destroy the inode and
prune_icache() will then just continue with the next inode in the inode_lru
list. But as I said above it's subtle...

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH, RFC] prune back iprune_sem
  2010-11-02 18:45 [PATCH, RFC] prune back iprune_sem Christoph Hellwig
  2010-11-04 23:32 ` Jan Kara
@ 2011-02-15 10:29 ` Christoph Hellwig
  2011-02-15 14:49   ` Jan Kara
  1 sibling, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2011-02-15 10:29 UTC (permalink / raw)
  To: viro, akpm; +Cc: linux-fsdevel

ping?

On Tue, Nov 02, 2010 at 07:45:36PM +0100, Christoph Hellwig wrote:
> iprune_sem is continously giving us lockdep warnings because we do take it in
> read mode in the reclaim path, but we're also doing non-NOFS allocations under
> it taken in write mode.
> 
> Taking a bit deeper look at it I think it's fixable quite trivially:
> 
>  - for invalidate_inodes we do not need iprune_sem at all.  We have an active
>    reference on the superblock, so the filesystem is not going away until it
>    has finished.
>  - for evict_inodes we do need it, to make sure prune_icache has done it's
>    work before we tear down the superblock.  But there is no reason to
>    hold it over the actual reclaim operation - it's enough to cycle through
>    it after the actual reclaim to make sure we wait for any pending
>    prune_icache to complete.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index ae2727a..cfa7722 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -492,8 +492,6 @@ void evict_inodes(struct super_block *sb)
>  	struct inode *inode, *next;
>  	LIST_HEAD(dispose);
>  
> -	down_write(&iprune_sem);
> -
>  	spin_lock(&inode_lock);
>  	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
>  		if (atomic_read(&inode->i_count))
> @@ -518,6 +516,13 @@ void evict_inodes(struct super_block *sb)
>  	spin_unlock(&inode_lock);
>  
>  	dispose_list(&dispose);
> +
> +	/*
> +	 * Cycle through iprune_sem to make sure any inode that prune_icache
> +	 * moved off the list before we took the lock has been fully torn
> +	 * down.
> +	 */
> +	down_write(&iprune_sem);
>  	up_write(&iprune_sem);
>  }
>  
> @@ -534,8 +539,6 @@ int invalidate_inodes(struct super_block *sb)
>  	struct inode *inode, *next;
>  	LIST_HEAD(dispose);
>  
> -	down_write(&iprune_sem);
> -
>  	spin_lock(&inode_lock);
>  	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
>  		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))
> @@ -559,7 +562,6 @@ int invalidate_inodes(struct super_block *sb)
>  	spin_unlock(&inode_lock);
>  
>  	dispose_list(&dispose);
> -	up_write(&iprune_sem);
>  
>  	return busy;
>  }
---end quoted text---

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH, RFC] prune back iprune_sem
  2011-02-15 10:29 ` Christoph Hellwig
@ 2011-02-15 14:49   ` Jan Kara
  2011-02-15 14:53     ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2011-02-15 14:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: viro, akpm, linux-fsdevel

  Hi,

On Tue 15-02-11 11:29:16, Christoph Hellwig wrote:
> On Tue, Nov 02, 2010 at 07:45:36PM +0100, Christoph Hellwig wrote:
> > iprune_sem is continously giving us lockdep warnings because we do take it in
> > read mode in the reclaim path, but we're also doing non-NOFS allocations under
> > it taken in write mode.
> > 
> > Taking a bit deeper look at it I think it's fixable quite trivially:
> > 
> >  - for invalidate_inodes we do not need iprune_sem at all.  We have an
> >  active reference on the superblock, so the filesystem is not going
> >  away until it has finished.
> >  - for evict_inodes we do need it, to make sure prune_icache has done
> >  it's work before we tear down the superblock.  But there is no reason
> >  to hold it over the actual reclaim operation - it's enough to cycle
> >  through it after the actual reclaim to make sure we wait for any
> >  pending prune_icache to complete.
  I just wonder: So with this change, evict_inodes() can start seeing
inodes, that are just being freed by prune_icache(). Thus we can trigger
WARN_ON() in evict_inodes():
                if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
                        WARN_ON(1);
                        continue;
                }
  Otherwise, the change looks safe to me. BTW, the iprune_sem is now used
only so that evict_inodes() can wait for prune_icache() to finish so maybe
we could have something simpler for that?

								Honza
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index ae2727a..cfa7722 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -492,8 +492,6 @@ void evict_inodes(struct super_block *sb)
> >  	struct inode *inode, *next;
> >  	LIST_HEAD(dispose);
> >  
> > -	down_write(&iprune_sem);
> > -
> >  	spin_lock(&inode_lock);
> >  	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> >  		if (atomic_read(&inode->i_count))
> > @@ -518,6 +516,13 @@ void evict_inodes(struct super_block *sb)
> >  	spin_unlock(&inode_lock);
> >  
> >  	dispose_list(&dispose);
> > +
> > +	/*
> > +	 * Cycle through iprune_sem to make sure any inode that prune_icache
> > +	 * moved off the list before we took the lock has been fully torn
> > +	 * down.
> > +	 */
> > +	down_write(&iprune_sem);
> >  	up_write(&iprune_sem);
> >  }
> >  
> > @@ -534,8 +539,6 @@ int invalidate_inodes(struct super_block *sb)
> >  	struct inode *inode, *next;
> >  	LIST_HEAD(dispose);
> >  
> > -	down_write(&iprune_sem);
> > -
> >  	spin_lock(&inode_lock);
> >  	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> >  		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))
> > @@ -559,7 +562,6 @@ int invalidate_inodes(struct super_block *sb)
> >  	spin_unlock(&inode_lock);
> >  
> >  	dispose_list(&dispose);
> > -	up_write(&iprune_sem);
> >  
> >  	return busy;
> >  }
> ---end quoted text---
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH, RFC] prune back iprune_sem
  2011-02-15 14:49   ` Jan Kara
@ 2011-02-15 14:53     ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2011-02-15 14:53 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, viro, akpm, linux-fsdevel

On Tue, Feb 15, 2011 at 03:49:05PM +0100, Jan Kara wrote:
> inodes, that are just being freed by prune_icache(). Thus we can trigger
> WARN_ON() in evict_inodes():
>                 if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
>                         WARN_ON(1);
>                         continue;
>                 }

That WARN_ON didn't exist when I submitted the patch three month ago,
but yes, it should be removed now.

>   Otherwise, the change looks safe to me. BTW, the iprune_sem is now used
> only so that evict_inodes() can wait for prune_icache() to finish so maybe
> we could have something simpler for that?

I can't think of anything simple.  The proper way to do it would be to
make the inode lru per-sb just like the dentry lru list.  That way we
always hold a reference to the superblock while pruning inodes form the
LRU and all associated issues go away.  Dave had a patch for this as
part of implementing a

	for_each_sb {
		prune dcache;
		prune icache;
		prune fs-specific cache;
	}

algorithm.  I still think it's the right way to go, but it fell under
the table and I really need a way to fix the lockdep warning / rare
deadlock the current scheme causes for XFS.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-02-15 14:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-02 18:45 [PATCH, RFC] prune back iprune_sem Christoph Hellwig
2010-11-04 23:32 ` Jan Kara
2011-02-15 10:29 ` Christoph Hellwig
2011-02-15 14:49   ` Jan Kara
2011-02-15 14:53     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).