* [PATCH, RFC] prune back iprune_sem
@ 2010-11-02 18:45 Christoph Hellwig
2010-11-04 23:32 ` Jan Kara
2011-02-15 10:29 ` Christoph Hellwig
0 siblings, 2 replies; 5+ messages in thread
From: Christoph Hellwig @ 2010-11-02 18:45 UTC (permalink / raw)
To: viro, akpm; +Cc: linux-fsdevel
iprune_sem is continously giving us lockdep warnings because we do take it in
read mode in the reclaim path, but we're also doing non-NOFS allocations under
it taken in write mode.
Taking a bit deeper look at it I think it's fixable quite trivially:
- for invalidate_inodes we do not need iprune_sem at all. We have an active
reference on the superblock, so the filesystem is not going away until it
has finished.
- for evict_inodes we do need it, to make sure prune_icache has done it's
work before we tear down the superblock. But there is no reason to
hold it over the actual reclaim operation - it's enough to cycle through
it after the actual reclaim to make sure we wait for any pending
prune_icache to complete.
Signed-off-by: Christoph Hellwig <hch@lst.de>
diff --git a/fs/inode.c b/fs/inode.c
index ae2727a..cfa7722 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -492,8 +492,6 @@ void evict_inodes(struct super_block *sb)
struct inode *inode, *next;
LIST_HEAD(dispose);
- down_write(&iprune_sem);
-
spin_lock(&inode_lock);
list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
if (atomic_read(&inode->i_count))
@@ -518,6 +516,13 @@ void evict_inodes(struct super_block *sb)
spin_unlock(&inode_lock);
dispose_list(&dispose);
+
+ /*
+ * Cycle through iprune_sem to make sure any inode that prune_icache
+ * moved off the list before we took the lock has been fully torn
+ * down.
+ */
+ down_write(&iprune_sem);
up_write(&iprune_sem);
}
@@ -534,8 +539,6 @@ int invalidate_inodes(struct super_block *sb)
struct inode *inode, *next;
LIST_HEAD(dispose);
- down_write(&iprune_sem);
-
spin_lock(&inode_lock);
list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))
@@ -559,7 +562,6 @@ int invalidate_inodes(struct super_block *sb)
spin_unlock(&inode_lock);
dispose_list(&dispose);
- up_write(&iprune_sem);
return busy;
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH, RFC] prune back iprune_sem
2010-11-02 18:45 [PATCH, RFC] prune back iprune_sem Christoph Hellwig
@ 2010-11-04 23:32 ` Jan Kara
2011-02-15 10:29 ` Christoph Hellwig
1 sibling, 0 replies; 5+ messages in thread
From: Jan Kara @ 2010-11-04 23:32 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: viro, akpm, linux-fsdevel
On Tue 02-11-10 19:45:36, Christoph Hellwig wrote:
> iprune_sem is continously giving us lockdep warnings because we do take it in
> read mode in the reclaim path, but we're also doing non-NOFS allocations under
> it taken in write mode.
>
> Taking a bit deeper look at it I think it's fixable quite trivially:
>
> - for invalidate_inodes we do not need iprune_sem at all. We have an active
> reference on the superblock, so the filesystem is not going away until it
> has finished.
> - for evict_inodes we do need it, to make sure prune_icache has done it's
> work before we tear down the superblock. But there is no reason to
> hold it over the actual reclaim operation - it's enough to cycle through
> it after the actual reclaim to make sure we wait for any pending
> prune_icache to complete.
The patch is OK but it's kind of subtle that evict_inodes() can now skip
some inode in the LRU list because prune_icache() is just processing it
and so it has elevated i_count. Everything will work out fine because
MS_ACTIVE is cleared and thus iput() will destroy the inode and
prune_icache() will then just continue with the next inode in the inode_lru
list. But as I said above it's subtle...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH, RFC] prune back iprune_sem
2010-11-02 18:45 [PATCH, RFC] prune back iprune_sem Christoph Hellwig
2010-11-04 23:32 ` Jan Kara
@ 2011-02-15 10:29 ` Christoph Hellwig
2011-02-15 14:49 ` Jan Kara
1 sibling, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2011-02-15 10:29 UTC (permalink / raw)
To: viro, akpm; +Cc: linux-fsdevel
ping?
On Tue, Nov 02, 2010 at 07:45:36PM +0100, Christoph Hellwig wrote:
> iprune_sem is continously giving us lockdep warnings because we do take it in
> read mode in the reclaim path, but we're also doing non-NOFS allocations under
> it taken in write mode.
>
> Taking a bit deeper look at it I think it's fixable quite trivially:
>
> - for invalidate_inodes we do not need iprune_sem at all. We have an active
> reference on the superblock, so the filesystem is not going away until it
> has finished.
> - for evict_inodes we do need it, to make sure prune_icache has done it's
> work before we tear down the superblock. But there is no reason to
> hold it over the actual reclaim operation - it's enough to cycle through
> it after the actual reclaim to make sure we wait for any pending
> prune_icache to complete.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> diff --git a/fs/inode.c b/fs/inode.c
> index ae2727a..cfa7722 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -492,8 +492,6 @@ void evict_inodes(struct super_block *sb)
> struct inode *inode, *next;
> LIST_HEAD(dispose);
>
> - down_write(&iprune_sem);
> -
> spin_lock(&inode_lock);
> list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> if (atomic_read(&inode->i_count))
> @@ -518,6 +516,13 @@ void evict_inodes(struct super_block *sb)
> spin_unlock(&inode_lock);
>
> dispose_list(&dispose);
> +
> + /*
> + * Cycle through iprune_sem to make sure any inode that prune_icache
> + * moved off the list before we took the lock has been fully torn
> + * down.
> + */
> + down_write(&iprune_sem);
> up_write(&iprune_sem);
> }
>
> @@ -534,8 +539,6 @@ int invalidate_inodes(struct super_block *sb)
> struct inode *inode, *next;
> LIST_HEAD(dispose);
>
> - down_write(&iprune_sem);
> -
> spin_lock(&inode_lock);
> list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))
> @@ -559,7 +562,6 @@ int invalidate_inodes(struct super_block *sb)
> spin_unlock(&inode_lock);
>
> dispose_list(&dispose);
> - up_write(&iprune_sem);
>
> return busy;
> }
---end quoted text---
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH, RFC] prune back iprune_sem
2011-02-15 10:29 ` Christoph Hellwig
@ 2011-02-15 14:49 ` Jan Kara
2011-02-15 14:53 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2011-02-15 14:49 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: viro, akpm, linux-fsdevel
Hi,
On Tue 15-02-11 11:29:16, Christoph Hellwig wrote:
> On Tue, Nov 02, 2010 at 07:45:36PM +0100, Christoph Hellwig wrote:
> > iprune_sem is continously giving us lockdep warnings because we do take it in
> > read mode in the reclaim path, but we're also doing non-NOFS allocations under
> > it taken in write mode.
> >
> > Taking a bit deeper look at it I think it's fixable quite trivially:
> >
> > - for invalidate_inodes we do not need iprune_sem at all. We have an
> > active reference on the superblock, so the filesystem is not going
> > away until it has finished.
> > - for evict_inodes we do need it, to make sure prune_icache has done
> > it's work before we tear down the superblock. But there is no reason
> > to hold it over the actual reclaim operation - it's enough to cycle
> > through it after the actual reclaim to make sure we wait for any
> > pending prune_icache to complete.
I just wonder: So with this change, evict_inodes() can start seeing
inodes, that are just being freed by prune_icache(). Thus we can trigger
WARN_ON() in evict_inodes():
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
WARN_ON(1);
continue;
}
Otherwise, the change looks safe to me. BTW, the iprune_sem is now used
only so that evict_inodes() can wait for prune_icache() to finish so maybe
we could have something simpler for that?
Honza
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index ae2727a..cfa7722 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -492,8 +492,6 @@ void evict_inodes(struct super_block *sb)
> > struct inode *inode, *next;
> > LIST_HEAD(dispose);
> >
> > - down_write(&iprune_sem);
> > -
> > spin_lock(&inode_lock);
> > list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> > if (atomic_read(&inode->i_count))
> > @@ -518,6 +516,13 @@ void evict_inodes(struct super_block *sb)
> > spin_unlock(&inode_lock);
> >
> > dispose_list(&dispose);
> > +
> > + /*
> > + * Cycle through iprune_sem to make sure any inode that prune_icache
> > + * moved off the list before we took the lock has been fully torn
> > + * down.
> > + */
> > + down_write(&iprune_sem);
> > up_write(&iprune_sem);
> > }
> >
> > @@ -534,8 +539,6 @@ int invalidate_inodes(struct super_block *sb)
> > struct inode *inode, *next;
> > LIST_HEAD(dispose);
> >
> > - down_write(&iprune_sem);
> > -
> > spin_lock(&inode_lock);
> > list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> > if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))
> > @@ -559,7 +562,6 @@ int invalidate_inodes(struct super_block *sb)
> > spin_unlock(&inode_lock);
> >
> > dispose_list(&dispose);
> > - up_write(&iprune_sem);
> >
> > return busy;
> > }
> ---end quoted text---
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH, RFC] prune back iprune_sem
2011-02-15 14:49 ` Jan Kara
@ 2011-02-15 14:53 ` Christoph Hellwig
0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2011-02-15 14:53 UTC (permalink / raw)
To: Jan Kara; +Cc: Christoph Hellwig, viro, akpm, linux-fsdevel
On Tue, Feb 15, 2011 at 03:49:05PM +0100, Jan Kara wrote:
> inodes, that are just being freed by prune_icache(). Thus we can trigger
> WARN_ON() in evict_inodes():
> if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
> WARN_ON(1);
> continue;
> }
That WARN_ON didn't exist when I submitted the patch three month ago,
but yes, it should be removed now.
> Otherwise, the change looks safe to me. BTW, the iprune_sem is now used
> only so that evict_inodes() can wait for prune_icache() to finish so maybe
> we could have something simpler for that?
I can't think of anything simple. The proper way to do it would be to
make the inode lru per-sb just like the dentry lru list. That way we
always hold a reference to the superblock while pruning inodes form the
LRU and all associated issues go away. Dave had a patch for this as
part of implementing a
for_each_sb {
prune dcache;
prune icache;
prune fs-specific cache;
}
algorithm. I still think it's the right way to go, but it fell under
the table and I really need a way to fix the lockdep warning / rare
deadlock the current scheme causes for XFS.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-02-15 14:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-02 18:45 [PATCH, RFC] prune back iprune_sem Christoph Hellwig
2010-11-04 23:32 ` Jan Kara
2011-02-15 10:29 ` Christoph Hellwig
2011-02-15 14:49 ` Jan Kara
2011-02-15 14:53 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).