All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	Jan Kara <jack@suse.cz>, Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH][BUG] Lack of mutex_lock in drop_pagecache_sb()
Date: Tue, 24 Mar 2009 15:44:57 +0800	[thread overview]
Message-ID: <20090324074457.GA7745@localhost> (raw)
In-Reply-To: <20090324155655.2684.61FB500B@jp.fujitsu.com>

Hi Masayoshi,

On Tue, Mar 24, 2009 at 03:06:45PM +0800, Masayoshi MIZUMA wrote:
> Hi, Fengguang
> 
> On Mon, 23 Mar 2009 18:38:46 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > Masasyoshi, 
> > 
> > On Wed, Mar 18, 2009 at 05:13:35PM +0900, Masasyoshi MIZUMA wrote:
> > > I create the patch which fixes lack of mutex_lock in drop_pagecache_sb().
> > > Please check the bug and the patch (below).
> > 
> Thank you for your comment, and I apologize to you for my lack
> of explanation.
> 
> > Is this a real producible bug or a theory one?
> This is a real bug.
> 
> > IMHO the I_FREEING flag should avoid the race.
> I supplement the explanation for this problem.
> 
> clear_inode() is called by dispose_list(), and sets the inode's 
> i_state to I_CLEAR. Therefore, the following conditional expression 
> doesn't match for the inode:
> "if (inode->i_state & (I_FREEING|I_WILL_FREE)) continue;"
> As the result, this problem can happen.
> 
> > 
> > > ----------------------------------------------------------------------
> > > 
> > > When drop_pagecache_sb() frees inodes, it doesn't get mutex_lock of 
> > > iprune_mutex. Therefore, if it races the process which frees inodes 
> > > (ex. prune_icache()), OS panic may happen.
> > > 
> > > An example of the panic flow is the following:
> > > ----------------------------------------------------------------------
> > >             [process A]               |         [process B]
> > >  |                                    |
> > >  |  shrink_icache_memory()            |
> > >  |      |                             |
> > >  |      V                             |
> > >  |    prune_icache()                  |  drop_pagecache()
> > >  |      mutex_lock(&iprune_mutex)     |      |
> > >  |      spin_lock(&inode_lock)        |      |
> > >  |          |                         |      V
> > >  |          |                         |    drop_pagecache_sb()
> > >  |          |                         |        |
> > 
> >           inode->i_state |= I_FREEING;
> > 
> > >  |          V                         |        V
> > >  |      spin_unlock(&inode_lock)      |      spin_lock(&inode_lock)
> > >  |          |                         |          |
> > 
> >                                                 if (inode->i_state & (I_FREEING|I_WILL_FREE))
> >                                                         continue;
> > 
> > >  |          |                         |          |
> > >  |          V                         |          V
> > >  |      dispose_list()                |        __iget()
> > >  |        list_del()                  |            |
> > >  |            |                       |            |
> > >  |            V                       |            V
> > >  |        spin_lock(&inode_lock)      |          list_move() <----- PANIC !!
> > >  |                                    |
> > >  V                                    |
> > > (time)
> > > ----------------------------------------------------------------------
> > > If the inode which Process B do list_move() with is the same as the one which
> > > Process A did list_del() with, OS may panic.
> 
> I applied your comment and then modified the panic flow figure.
> Please check below:
> ----------------------------------------------------------------------
>             [process A]               |        [process B]
>  |                                    |
>  |  shrink_icache_memory()            |
>  |      |                             |
>  |      V                             |
>  |    prune_icache()                  | drop_pagecache()
>  |      mutex_lock(&iprune_mutex)     |     |
>  |      spin_lock(&inode_lock)        |     |
>  |          |                         |     V
>  |          |                         |   drop_pagecache_sb()
>  |          |                         |       |
>  |          V                         |       |
>  |      inode->i_state |= I_FREEING;  |       |
>  |          |                         |       |
>  |          V                         |       V
>  |      spin_unlock(&inode_lock)      |     spin_lock(&inode_lock)
>  |          |                         |         |
>  |          |                         |         |
>  |          V                         |         |
>  |      dispose_list()                |         |
>  |        list_del()                  |         |
>  |            |                       |         |
>  |            V                       |         |
>  |        clear_inode()               |         |
>  |          inode->i_state = I_CLEAR  |         |
>  |            |                       |         |
>  |            |                       |         V
>  |            |                       |      if (inode->i_state & (I_FREEING|I_WILL_FREE))
>  |            |                       |              continue;           <---- NOT MATCH
>  |            |                       |           |
>  |            |                       |           V
>  |            |                       |      __iget()   
>  |            |                       |            |
>  |            V                       |            V
>  |        spin_lock(&inode_lock)      |        list_move() <----- PANIC !!
>  |                                    |
>  V                                    |
> (time)
> ----------------------------------------------------------------------

Ah thanks for the explanation!

How about this lightweight fix? Since s_umount is already taken in
drop_pagecache(), it's not necessary to take iprune_mutex again.

Thanks,
Fengguang
---
 fs/drop_caches.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- mm.orig/fs/drop_caches.c
+++ mm/fs/drop_caches.c
@@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup
 
 	spin_lock(&inode_lock);
 	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		if (inode->i_state & (I_FREEING|I_WILL_FREE))
+		if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
 			continue;
 		if (inode->i_mapping->nrpages == 0)
 			continue;

  reply	other threads:[~2009-03-24  7:45 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-18  8:13 [PATCH][BUG] Lack of mutex_lock in drop_pagecache_sb() Masasyoshi MIZUMA
2009-03-23 10:38 ` Wu Fengguang
2009-03-24  7:06   ` Masayoshi MIZUMA
2009-03-24  7:44     ` Wu Fengguang [this message]
2009-03-24 12:05       ` Jan Kara
2009-03-24 12:11         ` Wu Fengguang
2009-03-24 12:40         ` [PATCH] skip I_CLEAR state inodes Wu Fengguang
2009-03-30  7:18           ` [PATCH][RESEND for 2.6.29-rc8-mm1] " Wu Fengguang
2009-03-31 23:43             ` Andrew Morton
2009-04-01  0:53               ` Wu Fengguang
2009-06-01 21:38           ` [PATCH] " Eric Sandeen
2009-06-02  8:55             ` Wu Fengguang
2009-06-02 10:27               ` Jeff Layton
2009-06-02 11:37               ` Jan Kara
2009-06-02 21:48                 ` Eric Sandeen
2009-06-03 10:45                   ` Jeff Layton
2009-06-03 13:32                 ` Wu Fengguang
2009-06-03 14:00                   ` Jan Kara
2009-06-03 14:10                 ` Wu Fengguang
2009-06-03 14:16                   ` Jan Kara
2009-06-03 14:47                     ` Wu Fengguang
2009-06-06  3:07                       ` [PATCH] writeback: skip new or to-be-freed inodes Wu Fengguang
2009-06-08  7:03                         ` Artem Bityutskiy
2009-06-08  7:03                           ` Artem Bityutskiy
2009-06-08  9:29                           ` Wu Fengguang
2009-06-08 10:45                             ` Christoph Hellwig
2009-06-09  7:24                               ` Artem Bityutskiy
2009-06-09  7:24                                 ` Artem Bityutskiy
2009-06-09  7:03                             ` Artem Bityutskiy
2009-06-09  7:03                               ` Artem Bityutskiy
2009-06-08 17:07                         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090324074457.GA7745@localhost \
    --to=fengguang.wu@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=m.mizuma@jp.fujitsu.com \
    --cc=npiggin@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.