* Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()
2008-03-25 19:53 ` Andrew Morton
@ 2008-03-25 22:01 ` Trond Myklebust
[not found] ` <E1JeKRA-0002JM-Ot@localhost>
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Trond Myklebust @ 2008-03-25 22:01 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jan Kara, dgc, wfg, linux-kernel
On Tue, 2008-03-25 at 12:53 -0700, Andrew Morton wrote:
> On Tue, 25 Mar 2008 19:12:27 +0100
> Jan Kara <jack@suse.cz> wrote:
>
> > Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
> > before calling __invalidate_mapping_pages(). We just have to make sure
> > inode won't go away from under us by keeping reference to it and putting
> > the reference only after we have safely resumed the scan of the inode
> > list. A bit tricky but not too bad...
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > CC: Fengguang Wu <wfg@mail.ustc.edu.cn>
> > CC: David Chinner <dgc@sgi.com>
> >
> > ---
> > fs/drop_caches.c | 8 +++++++-
> > 1 files changed, 7 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> > index 59375ef..f5aae26 100644
> > --- a/fs/drop_caches.c
> > +++ b/fs/drop_caches.c
> > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
> >
> > static void drop_pagecache_sb(struct super_block *sb)
> > {
> > - struct inode *inode;
> > + struct inode *inode, *toput_inode = NULL;
> >
> > spin_lock(&inode_lock);
> > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > if (inode->i_state & (I_FREEING|I_WILL_FREE))
> > continue;
>
> OT: it might be worth having an `if (mapping->nrpages==0) continue' here.
>
> > + __iget(inode);
> > + spin_unlock(&inode_lock);
> > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> > + iput(toput_inode);
> > + toput_inode = inode;
> > + spin_lock(&inode_lock);
> > }
> > spin_unlock(&inode_lock);
> > + iput(toput_inode);
> > }
> >
> > void drop_pagecache(void)
>
> hrm. So we have a random ref on an inode without holding inode_lock. If
> we race with invalidate_list() we end up with an inode stuck on s_inodes
> and "Self-destruct in 5 seconds. Have a nice day...", don't we?
Calling drop_pagecache_sb() without having a reference to 'sb'? Surely
not...
Trond
^ permalink raw reply [flat|nested] 7+ messages in thread[parent not found: <E1JeKRA-0002JM-Ot@localhost>]
* Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()
[not found] ` <E1JeKRA-0002JM-Ot@localhost>
@ 2008-03-26 1:28 ` Fengguang Wu
0 siblings, 0 replies; 7+ messages in thread
From: Fengguang Wu @ 2008-03-26 1:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jan Kara, dgc, linux-kernel
On Tue, Mar 25, 2008 at 12:53:54PM -0700, Andrew Morton wrote:
> On Tue, 25 Mar 2008 19:12:27 +0100
> Jan Kara <jack@suse.cz> wrote:
>
> > Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
> > before calling __invalidate_mapping_pages(). We just have to make sure
> > inode won't go away from under us by keeping reference to it and putting
> > the reference only after we have safely resumed the scan of the inode
> > list. A bit tricky but not too bad...
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > CC: Fengguang Wu <wfg@mail.ustc.edu.cn>
> > CC: David Chinner <dgc@sgi.com>
> >
> > ---
> > fs/drop_caches.c | 8 +++++++-
> > 1 files changed, 7 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> > index 59375ef..f5aae26 100644
> > --- a/fs/drop_caches.c
> > +++ b/fs/drop_caches.c
> > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
> >
> > static void drop_pagecache_sb(struct super_block *sb)
> > {
> > - struct inode *inode;
> > + struct inode *inode, *toput_inode = NULL;
> >
> > spin_lock(&inode_lock);
> > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > if (inode->i_state & (I_FREEING|I_WILL_FREE))
> > continue;
>
> OT: it might be worth having an `if (mapping->nrpages==0) continue' here.
Good catch!
There are 25k opened inodes in my desktop, merely 10% of them has cached pages:
% cat /proc/sys/fs/inode-state
25395 129 0 0 0 0 0
# wc -l /proc/filecache
2542 /proc/filecache
+ if (!inode->i_mapping || !inode->i_mapping->nrpages)
+ continue;
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()
2008-03-25 19:53 ` Andrew Morton
2008-03-25 22:01 ` Trond Myklebust
[not found] ` <E1JeKRA-0002JM-Ot@localhost>
@ 2008-03-26 9:31 ` Jan Kara
2008-03-26 9:33 ` [PATCH] vfs: Skip inodes without pages to free " Jan Kara
3 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2008-03-26 9:31 UTC (permalink / raw)
To: Andrew Morton; +Cc: dgc, wfg, linux-kernel
On Tue 25-03-08 12:53:54, Andrew Morton wrote:
> On Tue, 25 Mar 2008 19:12:27 +0100
> Jan Kara <jack@suse.cz> wrote:
>
> > Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
> > before calling __invalidate_mapping_pages(). We just have to make sure
> > inode won't go away from under us by keeping reference to it and putting
> > the reference only after we have safely resumed the scan of the inode
> > list. A bit tricky but not too bad...
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > CC: Fengguang Wu <wfg@mail.ustc.edu.cn>
> > CC: David Chinner <dgc@sgi.com>
> >
> > ---
> > fs/drop_caches.c | 8 +++++++-
> > 1 files changed, 7 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> > index 59375ef..f5aae26 100644
> > --- a/fs/drop_caches.c
> > +++ b/fs/drop_caches.c
> > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
> >
> > static void drop_pagecache_sb(struct super_block *sb)
> > {
> > - struct inode *inode;
> > + struct inode *inode, *toput_inode = NULL;
> >
> > spin_lock(&inode_lock);
> > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > if (inode->i_state & (I_FREEING|I_WILL_FREE))
> > continue;
>
> OT: it might be worth having an `if (mapping->nrpages==0) continue' here.
Good idea. I'll send a patch in a minute.
> > + __iget(inode);
> > + spin_unlock(&inode_lock);
> > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> > + iput(toput_inode);
> > + toput_inode = inode;
> > + spin_lock(&inode_lock);
> > }
> > spin_unlock(&inode_lock);
> > + iput(toput_inode);
> > }
> >
> > void drop_pagecache(void)
>
> hrm. So we have a random ref on an inode without holding inode_lock. If
> we race with invalidate_list() we end up with an inode stuck on s_inodes
> and "Self-destruct in 5 seconds. Have a nice day...", don't we?
We hold s_umount for reading so we should be safe against someone trying
to do umount. We could possibly race with invalidate_list() called from
check_disk_change() but removing media without unmounting is a bad behavior
anyway. So I think we are fine.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH] vfs: Skip inodes without pages to free in drop_pagecache_sb()
2008-03-25 19:53 ` Andrew Morton
` (2 preceding siblings ...)
2008-03-26 9:31 ` Jan Kara
@ 2008-03-26 9:33 ` Jan Kara
3 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2008-03-26 9:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jan Kara, dgc, wfg, linux-kernel
Signed-off-by: Jan Kara <jack@suse.cz>
CC: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/drop_caches.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index f5aae26..7327a42 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -20,6 +20,8 @@ static void drop_pagecache_sb(struct super_block *sb)
list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
if (inode->i_state & (I_FREEING|I_WILL_FREE))
continue;
+ if (inode->i_mapping->nrpages == 0)
+ continue;
__iget(inode);
spin_unlock(&inode_lock);
__invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
--
1.5.2.4
^ permalink raw reply related [flat|nested] 7+ messages in thread