From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH] fs: Make sure data stored into inode is properly seen before unlocking new inode Date: Wed, 9 Sep 2009 15:03:34 -0700 Message-ID: <20090909150334.68fcde88.akpm@linux-foundation.org> References: <1252410063-26872-1-git-send-email-jack@suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@infradead.org, jack@suse.cz, stable@kernel.org To: Jan Kara Return-path: Received: from smtp1.linux-foundation.org ([140.211.169.13]:33518 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753545AbZIIWFd (ORCPT ); Wed, 9 Sep 2009 18:05:33 -0400 In-Reply-To: <1252410063-26872-1-git-send-email-jack@suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, 8 Sep 2009 13:41:03 +0200 Jan Kara wrote: > In theory it could happen that on one CPU we initialize a new inode but clearing > of I_NEW | I_LOCK gets reordered before some of the initialization. Thus on > another CPU we return not fully uptodate inode from iget_locked(). > > This seems to fix a corruption issue on ext3 mounted over NFS. > > Signed-off-by: Jan Kara > --- > fs/inode.c | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > Since Al doesn't seem to be online, does anybody else have opinion on this > patch? I can merge it via my tree but I'd like to get a review from someone > else. I'll merge it for 2.6.31. Please always remember -stable kernels when preparing bugfixes! This one should have had a Cc:stable in the changelog and in the email headers. > diff --git a/fs/inode.c b/fs/inode.c > index 901bad1..e9a8e77 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -696,6 +696,7 @@ void unlock_new_inode(struct inode *inode) > * just created it (so there can be no old holders > * that haven't tested I_LOCK). > */ > + smp_mb(); > WARN_ON((inode->i_state & (I_LOCK|I_NEW)) != (I_LOCK|I_NEW)); > inode->i_state &= ~(I_LOCK|I_NEW); > wake_up_inode(inode); But an uncommented barrier is always a hard thing for a reader to understand. Let's add something to help people. How's this look? --- a/fs/inode.c~fs-make-sure-data-stored-into-inode-is-properly-seen-before-unlocking-new-inode-fix +++ a/fs/inode.c @@ -697,12 +697,13 @@ void unlock_new_inode(struct inode *inod } #endif /* - * This is special! We do not need the spinlock - * when clearing I_LOCK, because we're guaranteed - * that nobody else tries to do anything about the - * state of the inode when it is locked, as we - * just created it (so there can be no old holders - * that haven't tested I_LOCK). + * This is special! We do not need the spinlock when clearing I_LOCK, + * because we're guaranteed that nobody else tries to do anything about + * the state of the inode when it is locked, as we just created it (so + * there can be no old holders that haven't tested I_LOCK). + * However we must emit the memory barrier so that other CPUs reliably + * see the clearing of I_LOCK after the other inode initialisation has + * completed. */ smp_mb(); WARN_ON((inode->i_state & (I_LOCK|I_NEW)) != (I_LOCK|I_NEW)); _