From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: linux-nfs-owner@vger.kernel.org Received: from fieldses.org ([174.143.236.118]:45514 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757144Ab1JSRgM (ORCPT ); Wed, 19 Oct 2011 13:36:12 -0400 Date: Wed, 19 Oct 2011 13:36:11 -0400 To: Steve Dickson Cc: Jeff Layton , Linux NFS Mailing list Subject: Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write Message-ID: <20111019173611.GC32028@fieldses.org> References: <1319038470-17750-1-git-send-email-steved@redhat.com> <20111019123626.7a80dfad@corrin.poochiereds.net> <4E9F047B.5000600@RedHat.com> <20111019132230.6cd85a0c@corrin.poochiereds.net> <4E9F0952.2040607@RedHat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <4E9F0952.2040607@RedHat.com> From: "J. Bruce Fields" Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, Oct 19, 2011 at 01:30:58PM -0400, Steve Dickson wrote: > > > On 10/19/2011 01:22 PM, Jeff Layton wrote: > > On Wed, 19 Oct 2011 13:10:19 -0400 > > Steve Dickson wrote: > > > >> > >> > >> On 10/19/2011 12:36 PM, Jeff Layton wrote: > >>> On Wed, 19 Oct 2011 11:34:30 -0400 > >>> Steve Dickson wrote: > >>> > >>>> This patch is a following on to commit 7a802337. Using the > >>>> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 > >>>> caused the fflush() and fclose() to fail in turn causing > >>>> corruption in the mtab. > >>>> > >>>> The failures were in the internals of both calls. Switch those > >>>> calls with the actual system calls eliminated the failures. > >>>> > >>>> Signed-off-by: Steve Dickson > >>>> --- > >>>> support/nfs/nfs_mntent.c | 4 ++-- > >>>> 1 files changed, 2 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c > >>>> index a2118a2..b80f270 100644 > >>>> --- a/support/nfs/nfs_mntent.c > >>>> +++ b/support/nfs/nfs_mntent.c > >>>> @@ -117,7 +117,7 @@ void > >>>> nfs_endmntent (mntFILE *mfp) { > >>>> if (mfp) { > >>>> if (mfp->mntent_fp) > >>>> - fclose(mfp->mntent_fp); > >>>> + close(fileno(mfp->mntent_fp)); > >>>> if (mfp->mntent_file) > >>>> free(mfp->mntent_file); > >>>> free(mfp); > >>>> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { > >>>> free(m3); > >>>> free(m4); > >>>> if (res >= 0) { > >>>> - res = fflush(mfp->mntent_fp); > >>>> + res = fsync(fileno(mfp->mntent_fp)); > >>> > >>> fsync doesn't imply an fflush. With this, I think you may end up > >>> without everything being committed to disk if part or all of it is > >>> still in the file stream buffer. You probably want to do an fflush() > >>> and then an fsync here. > >> The problem was with the fflush() call. The call was causing the > >> mount to drop core in turn causing mtab corruption. Changing that > >> call to a fsync() worked just fine... no corruption... every time! > >> > > > > Ahh, then you have another problem here too then. Most likely it was > > crashing because it caught a SIGXFSZ. Writing out the mtab should not > > be affected by signals. > So calling fflush() generates a SIGXFSZ and call fsync() does not... fflush() must hit this because it's calling write() to write out the stream buffer.... But lock_mtab() should have set SIGXFSZ to be ignored; is that not happening? > I really don't see what the problem is is call simply calling fsync() > which clearly works? We want to make sure the problem's really fixed. --b.