From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: Re: [PATCH][BUG] Lack of mutex_lock in drop_pagecache_sb() Date: Tue, 24 Mar 2009 15:44:57 +0800 Message-ID: <20090324074457.GA7745@localhost> References: <20090318170237.8F6C.61FB500B@jp.fujitsu.com> <20090323103846.GA16577@localhost> <20090324155655.2684.61FB500B@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "linux-fsdevel@vger.kernel.org" , "viro@zeniv.linux.org.uk" , Jan Kara , Nick Piggin To: Masayoshi MIZUMA Return-path: Received: from mga03.intel.com ([143.182.124.21]:64723 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757274AbZCXHpN (ORCPT ); Tue, 24 Mar 2009 03:45:13 -0400 Content-Disposition: inline In-Reply-To: <20090324155655.2684.61FB500B@jp.fujitsu.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hi Masayoshi, On Tue, Mar 24, 2009 at 03:06:45PM +0800, Masayoshi MIZUMA wrote: > Hi, Fengguang > > On Mon, 23 Mar 2009 18:38:46 +0800 > Wu Fengguang wrote: > > > Masasyoshi, > > > > On Wed, Mar 18, 2009 at 05:13:35PM +0900, Masasyoshi MIZUMA wrote: > > > I create the patch which fixes lack of mutex_lock in drop_pagecache_sb(). > > > Please check the bug and the patch (below). > > > Thank you for your comment, and I apologize to you for my lack > of explanation. > > > Is this a real producible bug or a theory one? > This is a real bug. > > > IMHO the I_FREEING flag should avoid the race. > I supplement the explanation for this problem. > > clear_inode() is called by dispose_list(), and sets the inode's > i_state to I_CLEAR. Therefore, the following conditional expression > doesn't match for the inode: > "if (inode->i_state & (I_FREEING|I_WILL_FREE)) continue;" > As the result, this problem can happen. > > > > > > ---------------------------------------------------------------------- > > > > > > When drop_pagecache_sb() frees inodes, it doesn't get mutex_lock of > > > iprune_mutex. Therefore, if it races the process which frees inodes > > > (ex. prune_icache()), OS panic may happen. > > > > > > An example of the panic flow is the following: > > > ---------------------------------------------------------------------- > > > [process A] | [process B] > > > | | > > > | shrink_icache_memory() | > > > | | | > > > | V | > > > | prune_icache() | drop_pagecache() > > > | mutex_lock(&iprune_mutex) | | > > > | spin_lock(&inode_lock) | | > > > | | | V > > > | | | drop_pagecache_sb() > > > | | | | > > > > inode->i_state |= I_FREEING; > > > > > | V | V > > > | spin_unlock(&inode_lock) | spin_lock(&inode_lock) > > > | | | | > > > > if (inode->i_state & (I_FREEING|I_WILL_FREE)) > > continue; > > > > > | | | | > > > | V | V > > > | dispose_list() | __iget() > > > | list_del() | | > > > | | | | > > > | V | V > > > | spin_lock(&inode_lock) | list_move() <----- PANIC !! > > > | | > > > V | > > > (time) > > > ---------------------------------------------------------------------- > > > If the inode which Process B do list_move() with is the same as the one which > > > Process A did list_del() with, OS may panic. > > I applied your comment and then modified the panic flow figure. > Please check below: > ---------------------------------------------------------------------- > [process A] | [process B] > | | > | shrink_icache_memory() | > | | | > | V | > | prune_icache() | drop_pagecache() > | mutex_lock(&iprune_mutex) | | > | spin_lock(&inode_lock) | | > | | | V > | | | drop_pagecache_sb() > | | | | > | V | | > | inode->i_state |= I_FREEING; | | > | | | | > | V | V > | spin_unlock(&inode_lock) | spin_lock(&inode_lock) > | | | | > | | | | > | V | | > | dispose_list() | | > | list_del() | | > | | | | > | V | | > | clear_inode() | | > | inode->i_state = I_CLEAR | | > | | | | > | | | V > | | | if (inode->i_state & (I_FREEING|I_WILL_FREE)) > | | | continue; <---- NOT MATCH > | | | | > | | | V > | | | __iget() > | | | | > | V | V > | spin_lock(&inode_lock) | list_move() <----- PANIC !! > | | > V | > (time) > ---------------------------------------------------------------------- Ah thanks for the explanation! How about this lightweight fix? Since s_umount is already taken in drop_pagecache(), it's not necessary to take iprune_mutex again. Thanks, Fengguang --- fs/drop_caches.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- mm.orig/fs/drop_caches.c +++ mm/fs/drop_caches.c @@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup spin_lock(&inode_lock); list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { - if (inode->i_state & (I_FREEING|I_WILL_FREE)) + if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE)) continue; if (inode->i_mapping->nrpages == 0) continue;