public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()
@ 2008-03-25 18:12 Jan Kara
  2008-03-25 19:53 ` Andrew Morton
       [not found] ` <E1JeJkq-0001bZ-6Z@localhost>
  0 siblings, 2 replies; 7+ messages in thread
From: Jan Kara @ 2008-03-25 18:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David Chinner, Fengguang Wu, linux-kernel

Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
before calling __invalidate_mapping_pages(). We just have to make sure
inode won't go away from under us by keeping reference to it and putting
the reference only after we have safely resumed the scan of the inode
list. A bit tricky but not too bad...

Signed-off-by: Jan Kara <jack@suse.cz>
CC: Fengguang Wu <wfg@mail.ustc.edu.cn>
CC: David Chinner <dgc@sgi.com>

---
 fs/drop_caches.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 59375ef..f5aae26 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -14,15 +14,21 @@ int sysctl_drop_caches;
 
 static void drop_pagecache_sb(struct super_block *sb)
 {
-	struct inode *inode;
+	struct inode *inode, *toput_inode = NULL;
 
 	spin_lock(&inode_lock);
 	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
 		if (inode->i_state & (I_FREEING|I_WILL_FREE))
 			continue;
+		__iget(inode);
+		spin_unlock(&inode_lock);
 		__invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
+		iput(toput_inode);
+		toput_inode = inode;
+		spin_lock(&inode_lock);
 	}
 	spin_unlock(&inode_lock);
+	iput(toput_inode);
 }
 
 void drop_pagecache(void)
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()
  2008-03-25 18:12 [PATCH] vfs: Fix lock inversion in drop_pagecache_sb() Jan Kara
@ 2008-03-25 19:53 ` Andrew Morton
  2008-03-25 22:01   ` Trond Myklebust
                     ` (3 more replies)
       [not found] ` <E1JeJkq-0001bZ-6Z@localhost>
  1 sibling, 4 replies; 7+ messages in thread
From: Andrew Morton @ 2008-03-25 19:53 UTC (permalink / raw)
  To: Jan Kara; +Cc: dgc, wfg, linux-kernel

On Tue, 25 Mar 2008 19:12:27 +0100
Jan Kara <jack@suse.cz> wrote:

> Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
> before calling __invalidate_mapping_pages(). We just have to make sure
> inode won't go away from under us by keeping reference to it and putting
> the reference only after we have safely resumed the scan of the inode
> list. A bit tricky but not too bad...
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> CC: Fengguang Wu <wfg@mail.ustc.edu.cn>
> CC: David Chinner <dgc@sgi.com>
> 
> ---
>  fs/drop_caches.c |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> index 59375ef..f5aae26 100644
> --- a/fs/drop_caches.c
> +++ b/fs/drop_caches.c
> @@ -14,15 +14,21 @@ int sysctl_drop_caches;
>  
>  static void drop_pagecache_sb(struct super_block *sb)
>  {
> -	struct inode *inode;
> +	struct inode *inode, *toput_inode = NULL;
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
>  		if (inode->i_state & (I_FREEING|I_WILL_FREE))
>  			continue;

OT: it might be worth having an `if (mapping->nrpages==0) continue' here.

> +		__iget(inode);
> +		spin_unlock(&inode_lock);
>  		__invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> +		iput(toput_inode);
> +		toput_inode = inode;
> +		spin_lock(&inode_lock);
>  	}
>  	spin_unlock(&inode_lock);
> +	iput(toput_inode);
>  }
>  
>  void drop_pagecache(void)

hrm.  So we have a random ref on an inode without holding inode_lock.  If
we race with invalidate_list() we end up with an inode stuck on s_inodes
and "Self-destruct in 5 seconds.  Have a nice day...", don't we?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()
  2008-03-25 19:53 ` Andrew Morton
@ 2008-03-25 22:01   ` Trond Myklebust
       [not found]   ` <E1JeKRA-0002JM-Ot@localhost>
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Trond Myklebust @ 2008-03-25 22:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, dgc, wfg, linux-kernel


On Tue, 2008-03-25 at 12:53 -0700, Andrew Morton wrote:
> On Tue, 25 Mar 2008 19:12:27 +0100
> Jan Kara <jack@suse.cz> wrote:
> 
> > Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
> > before calling __invalidate_mapping_pages(). We just have to make sure
> > inode won't go away from under us by keeping reference to it and putting
> > the reference only after we have safely resumed the scan of the inode
> > list. A bit tricky but not too bad...
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > CC: Fengguang Wu <wfg@mail.ustc.edu.cn>
> > CC: David Chinner <dgc@sgi.com>
> > 
> > ---
> >  fs/drop_caches.c |    8 +++++++-
> >  1 files changed, 7 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> > index 59375ef..f5aae26 100644
> > --- a/fs/drop_caches.c
> > +++ b/fs/drop_caches.c
> > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
> >  
> >  static void drop_pagecache_sb(struct super_block *sb)
> >  {
> > -	struct inode *inode;
> > +	struct inode *inode, *toput_inode = NULL;
> >  
> >  	spin_lock(&inode_lock);
> >  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> >  		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> >  			continue;
> 
> OT: it might be worth having an `if (mapping->nrpages==0) continue' here.
> 
> > +		__iget(inode);
> > +		spin_unlock(&inode_lock);
> >  		__invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> > +		iput(toput_inode);
> > +		toput_inode = inode;
> > +		spin_lock(&inode_lock);
> >  	}
> >  	spin_unlock(&inode_lock);
> > +	iput(toput_inode);
> >  }
> >  
> >  void drop_pagecache(void)
> 
> hrm.  So we have a random ref on an inode without holding inode_lock.  If
> we race with invalidate_list() we end up with an inode stuck on s_inodes
> and "Self-destruct in 5 seconds.  Have a nice day...", don't we?

Calling drop_pagecache_sb() without having a reference to 'sb'? Surely
not...

Trond


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()
       [not found] ` <E1JeJkq-0001bZ-6Z@localhost>
@ 2008-03-26  0:44   ` Fengguang Wu
  0 siblings, 0 replies; 7+ messages in thread
From: Fengguang Wu @ 2008-03-26  0:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, David Chinner, linux-kernel

On Tue, Mar 25, 2008 at 07:12:27PM +0100, Jan Kara wrote:
> Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
> before calling __invalidate_mapping_pages(). We just have to make sure
> inode won't go away from under us by keeping reference to it and putting
> the reference only after we have safely resumed the scan of the inode
> list. A bit tricky but not too bad...

Reviewed-by: Fengguang Wu <wfg@mail.ustc.edu.cn>

It's a handy trick to iterate through the list_head :-)
I have practiced this in my filecache code, and it works nice.

Fengguang

> Signed-off-by: Jan Kara <jack@suse.cz>
> CC: Fengguang Wu <wfg@mail.ustc.edu.cn>
> CC: David Chinner <dgc@sgi.com>
> 
> ---
>  fs/drop_caches.c |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> index 59375ef..f5aae26 100644
> --- a/fs/drop_caches.c
> +++ b/fs/drop_caches.c
> @@ -14,15 +14,21 @@ int sysctl_drop_caches;
>  
>  static void drop_pagecache_sb(struct super_block *sb)
>  {
> -	struct inode *inode;
> +	struct inode *inode, *toput_inode = NULL;
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
>  		if (inode->i_state & (I_FREEING|I_WILL_FREE))
>  			continue;
> +		__iget(inode);
> +		spin_unlock(&inode_lock);
>  		__invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> +		iput(toput_inode);
> +		toput_inode = inode;
> +		spin_lock(&inode_lock);
>  	}
>  	spin_unlock(&inode_lock);
> +	iput(toput_inode);
>  }
>  
>  void drop_pagecache(void)
> -- 
> 1.5.2.4
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()
       [not found]   ` <E1JeKRA-0002JM-Ot@localhost>
@ 2008-03-26  1:28     ` Fengguang Wu
  0 siblings, 0 replies; 7+ messages in thread
From: Fengguang Wu @ 2008-03-26  1:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, dgc, linux-kernel

On Tue, Mar 25, 2008 at 12:53:54PM -0700, Andrew Morton wrote:
> On Tue, 25 Mar 2008 19:12:27 +0100
> Jan Kara <jack@suse.cz> wrote:
> 
> > Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
> > before calling __invalidate_mapping_pages(). We just have to make sure
> > inode won't go away from under us by keeping reference to it and putting
> > the reference only after we have safely resumed the scan of the inode
> > list. A bit tricky but not too bad...
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > CC: Fengguang Wu <wfg@mail.ustc.edu.cn>
> > CC: David Chinner <dgc@sgi.com>
> > 
> > ---
> >  fs/drop_caches.c |    8 +++++++-
> >  1 files changed, 7 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> > index 59375ef..f5aae26 100644
> > --- a/fs/drop_caches.c
> > +++ b/fs/drop_caches.c
> > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
> >  
> >  static void drop_pagecache_sb(struct super_block *sb)
> >  {
> > -	struct inode *inode;
> > +	struct inode *inode, *toput_inode = NULL;
> >  
> >  	spin_lock(&inode_lock);
> >  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> >  		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> >  			continue;
> 
> OT: it might be worth having an `if (mapping->nrpages==0) continue' here.

Good catch!

There are 25k opened inodes in my desktop, merely 10% of them has cached pages:

                % cat /proc/sys/fs/inode-state   
                25395   129     0       0       0       0       0
                # wc -l /proc/filecache 
                2542 /proc/filecache

+               if (!inode->i_mapping || !inode->i_mapping->nrpages)
+                       continue;


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] vfs: Fix lock inversion in drop_pagecache_sb()
  2008-03-25 19:53 ` Andrew Morton
  2008-03-25 22:01   ` Trond Myklebust
       [not found]   ` <E1JeKRA-0002JM-Ot@localhost>
@ 2008-03-26  9:31   ` Jan Kara
  2008-03-26  9:33   ` [PATCH] vfs: Skip inodes without pages to free " Jan Kara
  3 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2008-03-26  9:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: dgc, wfg, linux-kernel

On Tue 25-03-08 12:53:54, Andrew Morton wrote:
> On Tue, 25 Mar 2008 19:12:27 +0100
> Jan Kara <jack@suse.cz> wrote:
> 
> > Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
> > before calling __invalidate_mapping_pages(). We just have to make sure
> > inode won't go away from under us by keeping reference to it and putting
> > the reference only after we have safely resumed the scan of the inode
> > list. A bit tricky but not too bad...
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > CC: Fengguang Wu <wfg@mail.ustc.edu.cn>
> > CC: David Chinner <dgc@sgi.com>
> > 
> > ---
> >  fs/drop_caches.c |    8 +++++++-
> >  1 files changed, 7 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> > index 59375ef..f5aae26 100644
> > --- a/fs/drop_caches.c
> > +++ b/fs/drop_caches.c
> > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
> >  
> >  static void drop_pagecache_sb(struct super_block *sb)
> >  {
> > -	struct inode *inode;
> > +	struct inode *inode, *toput_inode = NULL;
> >  
> >  	spin_lock(&inode_lock);
> >  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> >  		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> >  			continue;
> 
> OT: it might be worth having an `if (mapping->nrpages==0) continue' here.
  Good idea. I'll send a patch in a minute.

> > +		__iget(inode);
> > +		spin_unlock(&inode_lock);
> >  		__invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> > +		iput(toput_inode);
> > +		toput_inode = inode;
> > +		spin_lock(&inode_lock);
> >  	}
> >  	spin_unlock(&inode_lock);
> > +	iput(toput_inode);
> >  }
> >  
> >  void drop_pagecache(void)
> 
> hrm.  So we have a random ref on an inode without holding inode_lock.  If
> we race with invalidate_list() we end up with an inode stuck on s_inodes
> and "Self-destruct in 5 seconds.  Have a nice day...", don't we?
  We hold s_umount for reading so we should be safe against someone trying
to do umount. We could possibly race with invalidate_list() called from
check_disk_change() but removing media without unmounting is a bad behavior
anyway. So I think we are fine.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] vfs: Skip inodes without pages to free in drop_pagecache_sb()
  2008-03-25 19:53 ` Andrew Morton
                     ` (2 preceding siblings ...)
  2008-03-26  9:31   ` Jan Kara
@ 2008-03-26  9:33   ` Jan Kara
  3 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2008-03-26  9:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, dgc, wfg, linux-kernel


Signed-off-by: Jan Kara <jack@suse.cz>
CC: Fengguang Wu <wfg@mail.ustc.edu.cn>

---
 fs/drop_caches.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index f5aae26..7327a42 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -20,6 +20,8 @@ static void drop_pagecache_sb(struct super_block *sb)
 	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
 		if (inode->i_state & (I_FREEING|I_WILL_FREE))
 			continue;
+		if (inode->i_mapping->nrpages == 0)
+			continue;
 		__iget(inode);
 		spin_unlock(&inode_lock);
 		__invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-03-26  9:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-25 18:12 [PATCH] vfs: Fix lock inversion in drop_pagecache_sb() Jan Kara
2008-03-25 19:53 ` Andrew Morton
2008-03-25 22:01   ` Trond Myklebust
     [not found]   ` <E1JeKRA-0002JM-Ot@localhost>
2008-03-26  1:28     ` Fengguang Wu
2008-03-26  9:31   ` Jan Kara
2008-03-26  9:33   ` [PATCH] vfs: Skip inodes without pages to free " Jan Kara
     [not found] ` <E1JeJkq-0001bZ-6Z@localhost>
2008-03-26  0:44   ` [PATCH] vfs: Fix lock inversion " Fengguang Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox