From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stuart Sheldon Subject: Re: Possible NFS bug in 2.6.34... Date: Sun, 23 May 2010 10:20:27 -0700 Message-ID: <4BF963DB.1090908@actusa.net> References: <4BF72D34.60701@actusa.net> <1274544580.4860.73.camel@heimdal.trondhjem.org> <4BF803F2.2010506@actusa.net> <1274546973.4860.78.camel@heimdal.trondhjem.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, Stuart Sheldon To: Trond Myklebust Return-path: Received: from internal.actusa.net ([208.83.100.16]:56634 "EHLO internal.actusa.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754008Ab0EWRUe (ORCPT ); Sun, 23 May 2010 13:20:34 -0400 In-Reply-To: <1274546973.4860.78.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> Sender: linux-nfs-owner@vger.kernel.org List-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Trond Myklebust wrote: > On Sat, 2010-05-22 at 09:18 -0700, Stuart Sheldon wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA256 >> >> On 05/22/2010 09:09 AM, Trond Myklebust wrote: >>> Do you see any more NFS traffic to the server when the above hang >>> occurs? I'm wondering if we don't need something like the following >>> patch. >>> >>> Cheers >>> Trond >>> -------------------------------------------------------------------------------- >>> From 0b574497e05f62fd49cfe26f1b97e3669525446c Mon Sep 17 00:00:00 2001 >>> From: Trond Myklebust >>> Date: Sat, 22 May 2010 11:49:19 -0400 >>> Subject: [PATCH] NFS: Ensure that we mark the inode as dirty if we exit early from commit >>> >>> If we exit from nfs_commit_inode() without ensuring that the COMMIT rpc >>> call has been completed, we must re-mark the inode as dirty. Otherwise, >>> future calls to sync_inode() with the WB_SYNC_ALL flag set will fail to >>> ensure that the data is on the disk. >>> >>> Signed-off-by: Trond Myklebust >>> Cc: stable@kernel.org >>> --- >>> fs/nfs/write.c | 13 +++++++++++-- >>> 1 files changed, 11 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c >>> index 3aea3ca..b8a6d7a 100644 >>> --- a/fs/nfs/write.c >>> +++ b/fs/nfs/write.c >>> @@ -1386,7 +1386,7 @@ static int nfs_commit_inode(struct inode *inode, int how) >>> int res = 0; >>> >>> if (!nfs_commit_set_lock(NFS_I(inode), may_wait)) >>> - goto out; >>> + goto out_mark_dirty; >>> spin_lock(&inode->i_lock); >>> res = nfs_scan_commit(inode, &head, 0, 0); >>> spin_unlock(&inode->i_lock); >>> @@ -1398,9 +1398,18 @@ static int nfs_commit_inode(struct inode *inode, int how) >>> wait_on_bit(&NFS_I(inode)->flags, NFS_INO_COMMIT, >>> nfs_wait_bit_killable, >>> TASK_KILLABLE); >>> + else >>> + goto out_mark_dirty; >>> } else >>> nfs_commit_clear_lock(NFS_I(inode)); >>> -out: >>> + return res; >>> + /* Note: If we exit without ensuring that the commit is complete, >>> + * we must mark the inode as dirty. Otherwise, future calls to >>> + * sync_inode() with the WB_SYNC_ALL flag set will fail to ensure >>> + * that the data is on the disk. >>> + */ >>> +out_mark_dirty: >>> + __mark_inode_dirty(inode, I_DIRTY_DATASYNC); >>> return res; >>> } >>> >> Trond, >> >> When it occurred, it continues to throw those errors in the log, and all >> access to the NFS mount stalled until I hard reset the client system. >> >> Do you want me to apply the patch and see if I can recreate the condition? > > Yes, please do. Could you also apply the following debugging patch on > top of the above one, and see if the WARN_ON() triggers when both > patches are applied? > > Cheers > Trond > ------------------------------------------------------------------------------------------------ > From 9883e35957468987f4338525c1d800d637bc05b7 Mon Sep 17 00:00:00 2001 > From: Trond Myklebust > Date: Sat, 22 May 2010 10:46:41 -0400 > Subject: [PATCH 2/2] NFS: debugging code for nfs_wb_page() > > Signed-off-by: Trond Myklebust > --- > fs/nfs/write.c | 9 +++++++++ > 1 files changed, 9 insertions(+), 0 deletions(-) > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c > index b8a6d7a..0558fab 100644 > --- a/fs/nfs/write.c > +++ b/fs/nfs/write.c > @@ -1519,12 +1519,21 @@ int nfs_wb_page(struct inode *inode, struct page *page) > int ret; > > while(PagePrivate(page)) { > + unsigned dirty; > + int syncing; > + > wait_on_page_writeback(page); > if (clear_page_dirty_for_io(page)) { > ret = nfs_writepage_locked(page, &wbc); > if (ret < 0) > goto out_error; > + continue; > } > + > + dirty = inode->i_state & I_DIRTY; > + syncing = inode->i_state & I_SYNC; > + WARN_ON(!syncing && !dirty && PagePrivate(page)); > + > ret = sync_inode(inode, &wbc); > if (ret < 0) > goto out_error; The problem seems to be fixed with this, but I'm not seeing / don't know where to find the 'WARN_ON' messages. If they are suppose to be in the syslog, then there weren't any. I'm rolling back to the unpatched kernel to verify that I can still reproduce the problem natively. Will follow up on Monday. Stu - -- If you took all the girls I knew When I was single And brought them all together for one night I know theyd never match My sweet imagination And everything looks worse in black and white -- Paul Simon - "Kodachrome Lyrics" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAEBCAAGBQJL+WPYAAoJEFKVLITDJSGSU0sP/jdJt9NiliGFJ0IB3I4Pt6o/ aHI5wzWWG8Uxzn5UXBumEp79hWXZ8D79kLZ4L8zh9/9hpi8rbExz1ci9IJmjU1LW qWQVDqmv36uKX9YUzmj8d4505G0Czf9BkU6vsOy4elZ4pAy/Q9EXKFVS5mtirO7u 9FeYJebvbhvdICJTaLDbryugpxWYV6P6bGVglowdbqVWBnKo5QXevWnm6s3Lc1Jd girpqkQ2f4NddfeW1TbITtBr0bEPYuhK4s4XMdWiYIHNIaRSBJDF5Hlues8LWxu2 ++4xz1G7n/K59hRBRX+giBGaSXXl/GSGib87RfwSCrg5qEytNbSRKQX0WuFFxARS tTbU+zwDpUF7SSvYJZGDh2EEPr2QNfOVCCxVmf1Oe4JAs0OJku1z1ReKpr+CoZg2 lgIFl59bPBNjMcx8GNynnJgTW1IMXWsM8UpTpiAfwTpffXaW3YEH2V855Px9Mkqt ONuvCll3CWEbwdmisWqvqRAix7oHNh4VMnDOTfb1eYf5ytw5mLfxZaznXhwL8FjE pUvHZG4TRMUENjucs38dvwx4Vx63DEdxMSK5C0GpdsI16xh0hMKa3ohWaVtSgwIE Emf1HU2G0vdxl+zMI1IyerNp1T+oxu1rr7eOYWzl3HO8bQc+9ua7yCntpni/c7Dz Ge8hUPZTZAVmFRRy9zBO =2+R2 -----END PGP SIGNATURE-----