* [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write @ 2011-10-19 15:34 Steve Dickson 2011-10-19 16:36 ` Jeff Layton 2011-10-19 19:44 ` Steve Dickson 0 siblings, 2 replies; 16+ messages in thread From: Steve Dickson @ 2011-10-19 15:34 UTC (permalink / raw) To: Linux NFS Mailing list This patch is a following on to commit 7a802337. Using the tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 caused the fflush() and fclose() to fail in turn causing corruption in the mtab. The failures were in the internals of both calls. Switch those calls with the actual system calls eliminated the failures. Signed-off-by: Steve Dickson <steved@redhat.com> --- support/nfs/nfs_mntent.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c index a2118a2..b80f270 100644 --- a/support/nfs/nfs_mntent.c +++ b/support/nfs/nfs_mntent.c @@ -117,7 +117,7 @@ void nfs_endmntent (mntFILE *mfp) { if (mfp) { if (mfp->mntent_fp) - fclose(mfp->mntent_fp); + close(fileno(mfp->mntent_fp)); if (mfp->mntent_file) free(mfp->mntent_file); free(mfp); @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { free(m3); free(m4); if (res >= 0) { - res = fflush(mfp->mntent_fp); + res = fsync(fileno(mfp->mntent_fp)); if (res < 0) /* Avoid leaving a corrupt mtab file */ ftruncate(fileno(mfp->mntent_fp), length); -- 1.7.6.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 15:34 [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write Steve Dickson @ 2011-10-19 16:36 ` Jeff Layton 2011-10-19 17:10 ` Steve Dickson 2011-10-19 19:44 ` Steve Dickson 1 sibling, 1 reply; 16+ messages in thread From: Jeff Layton @ 2011-10-19 16:36 UTC (permalink / raw) To: Steve Dickson; +Cc: Linux NFS Mailing list On Wed, 19 Oct 2011 11:34:30 -0400 Steve Dickson <steved@redhat.com> wrote: > This patch is a following on to commit 7a802337. Using the > tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 > caused the fflush() and fclose() to fail in turn causing > corruption in the mtab. > > The failures were in the internals of both calls. Switch those > calls with the actual system calls eliminated the failures. > > Signed-off-by: Steve Dickson <steved@redhat.com> > --- > support/nfs/nfs_mntent.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c > index a2118a2..b80f270 100644 > --- a/support/nfs/nfs_mntent.c > +++ b/support/nfs/nfs_mntent.c > @@ -117,7 +117,7 @@ void > nfs_endmntent (mntFILE *mfp) { > if (mfp) { > if (mfp->mntent_fp) > - fclose(mfp->mntent_fp); > + close(fileno(mfp->mntent_fp)); > if (mfp->mntent_file) > free(mfp->mntent_file); > free(mfp); > @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { > free(m3); > free(m4); > if (res >= 0) { > - res = fflush(mfp->mntent_fp); > + res = fsync(fileno(mfp->mntent_fp)); fsync doesn't imply an fflush. With this, I think you may end up without everything being committed to disk if part or all of it is still in the file stream buffer. You probably want to do an fflush() and then an fsync here. > if (res < 0) > /* Avoid leaving a corrupt mtab file */ > ftruncate(fileno(mfp->mntent_fp), length); -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 16:36 ` Jeff Layton @ 2011-10-19 17:10 ` Steve Dickson 2011-10-19 17:22 ` Jeff Layton 2011-10-19 17:28 ` J. Bruce Fields 0 siblings, 2 replies; 16+ messages in thread From: Steve Dickson @ 2011-10-19 17:10 UTC (permalink / raw) To: Jeff Layton; +Cc: Linux NFS Mailing list On 10/19/2011 12:36 PM, Jeff Layton wrote: > On Wed, 19 Oct 2011 11:34:30 -0400 > Steve Dickson <steved@redhat.com> wrote: > >> This patch is a following on to commit 7a802337. Using the >> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 >> caused the fflush() and fclose() to fail in turn causing >> corruption in the mtab. >> >> The failures were in the internals of both calls. Switch those >> calls with the actual system calls eliminated the failures. >> >> Signed-off-by: Steve Dickson <steved@redhat.com> >> --- >> support/nfs/nfs_mntent.c | 4 ++-- >> 1 files changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c >> index a2118a2..b80f270 100644 >> --- a/support/nfs/nfs_mntent.c >> +++ b/support/nfs/nfs_mntent.c >> @@ -117,7 +117,7 @@ void >> nfs_endmntent (mntFILE *mfp) { >> if (mfp) { >> if (mfp->mntent_fp) >> - fclose(mfp->mntent_fp); >> + close(fileno(mfp->mntent_fp)); >> if (mfp->mntent_file) >> free(mfp->mntent_file); >> free(mfp); >> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { >> free(m3); >> free(m4); >> if (res >= 0) { >> - res = fflush(mfp->mntent_fp); >> + res = fsync(fileno(mfp->mntent_fp)); > > fsync doesn't imply an fflush. With this, I think you may end up > without everything being committed to disk if part or all of it is > still in the file stream buffer. You probably want to do an fflush() > and then an fsync here. The problem was with the fflush() call. The call was causing the mount to drop core in turn causing mtab corruption. Changing that call to a fsync() worked just fine... no corruption... every time! steved. > >> if (res < 0) >> /* Avoid leaving a corrupt mtab file */ >> ftruncate(fileno(mfp->mntent_fp), length); > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 17:10 ` Steve Dickson @ 2011-10-19 17:22 ` Jeff Layton 2011-10-19 17:30 ` Steve Dickson 2011-10-19 17:28 ` J. Bruce Fields 1 sibling, 1 reply; 16+ messages in thread From: Jeff Layton @ 2011-10-19 17:22 UTC (permalink / raw) To: Steve Dickson; +Cc: Linux NFS Mailing list On Wed, 19 Oct 2011 13:10:19 -0400 Steve Dickson <SteveD@redhat.com> wrote: > > > On 10/19/2011 12:36 PM, Jeff Layton wrote: > > On Wed, 19 Oct 2011 11:34:30 -0400 > > Steve Dickson <steved@redhat.com> wrote: > > > >> This patch is a following on to commit 7a802337. Using the > >> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 > >> caused the fflush() and fclose() to fail in turn causing > >> corruption in the mtab. > >> > >> The failures were in the internals of both calls. Switch those > >> calls with the actual system calls eliminated the failures. > >> > >> Signed-off-by: Steve Dickson <steved@redhat.com> > >> --- > >> support/nfs/nfs_mntent.c | 4 ++-- > >> 1 files changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c > >> index a2118a2..b80f270 100644 > >> --- a/support/nfs/nfs_mntent.c > >> +++ b/support/nfs/nfs_mntent.c > >> @@ -117,7 +117,7 @@ void > >> nfs_endmntent (mntFILE *mfp) { > >> if (mfp) { > >> if (mfp->mntent_fp) > >> - fclose(mfp->mntent_fp); > >> + close(fileno(mfp->mntent_fp)); > >> if (mfp->mntent_file) > >> free(mfp->mntent_file); > >> free(mfp); > >> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { > >> free(m3); > >> free(m4); > >> if (res >= 0) { > >> - res = fflush(mfp->mntent_fp); > >> + res = fsync(fileno(mfp->mntent_fp)); > > > > fsync doesn't imply an fflush. With this, I think you may end up > > without everything being committed to disk if part or all of it is > > still in the file stream buffer. You probably want to do an fflush() > > and then an fsync here. > The problem was with the fflush() call. The call was causing the > mount to drop core in turn causing mtab corruption. Changing that > call to a fsync() worked just fine... no corruption... every time! > Ahh, then you have another problem here too then. Most likely it was crashing because it caught a SIGXFSZ. Writing out the mtab should not be affected by signals. In the mount.cifs helper, I have it do the following before altering the mtab (with appropriate error handling): rc = setreuid(geteuid(), -1); rc = sigfillset(&mask); rc = sigprocmask(SIG_SETMASK, &mask, &oldmask); IOW, set the real uid to the effective UID to ensure that an unprivileged user can't signal the process if it was run as a setuid root program and the real UID isn't root. It then masks off all signals. That leaves SIGKILL by root as a way to interrupt it but there's really nothing you can do about that. > > > > >> if (res < 0) > >> /* Avoid leaving a corrupt mtab file */ > >> ftruncate(fileno(mfp->mntent_fp), length); > > > > -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 17:22 ` Jeff Layton @ 2011-10-19 17:30 ` Steve Dickson 2011-10-19 17:36 ` J. Bruce Fields 2011-10-19 17:40 ` Jeff Layton 0 siblings, 2 replies; 16+ messages in thread From: Steve Dickson @ 2011-10-19 17:30 UTC (permalink / raw) To: Jeff Layton; +Cc: Linux NFS Mailing list On 10/19/2011 01:22 PM, Jeff Layton wrote: > On Wed, 19 Oct 2011 13:10:19 -0400 > Steve Dickson <SteveD@redhat.com> wrote: > >> >> >> On 10/19/2011 12:36 PM, Jeff Layton wrote: >>> On Wed, 19 Oct 2011 11:34:30 -0400 >>> Steve Dickson <steved@redhat.com> wrote: >>> >>>> This patch is a following on to commit 7a802337. Using the >>>> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 >>>> caused the fflush() and fclose() to fail in turn causing >>>> corruption in the mtab. >>>> >>>> The failures were in the internals of both calls. Switch those >>>> calls with the actual system calls eliminated the failures. >>>> >>>> Signed-off-by: Steve Dickson <steved@redhat.com> >>>> --- >>>> support/nfs/nfs_mntent.c | 4 ++-- >>>> 1 files changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c >>>> index a2118a2..b80f270 100644 >>>> --- a/support/nfs/nfs_mntent.c >>>> +++ b/support/nfs/nfs_mntent.c >>>> @@ -117,7 +117,7 @@ void >>>> nfs_endmntent (mntFILE *mfp) { >>>> if (mfp) { >>>> if (mfp->mntent_fp) >>>> - fclose(mfp->mntent_fp); >>>> + close(fileno(mfp->mntent_fp)); >>>> if (mfp->mntent_file) >>>> free(mfp->mntent_file); >>>> free(mfp); >>>> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { >>>> free(m3); >>>> free(m4); >>>> if (res >= 0) { >>>> - res = fflush(mfp->mntent_fp); >>>> + res = fsync(fileno(mfp->mntent_fp)); >>> >>> fsync doesn't imply an fflush. With this, I think you may end up >>> without everything being committed to disk if part or all of it is >>> still in the file stream buffer. You probably want to do an fflush() >>> and then an fsync here. >> The problem was with the fflush() call. The call was causing the >> mount to drop core in turn causing mtab corruption. Changing that >> call to a fsync() worked just fine... no corruption... every time! >> > > Ahh, then you have another problem here too then. Most likely it was > crashing because it caught a SIGXFSZ. Writing out the mtab should not > be affected by signals. So calling fflush() generates a SIGXFSZ and call fsync() does not... I really don't see what the problem is is call simply calling fsync() which clearly works? steved. > > In the mount.cifs helper, I have it do the following before altering > the mtab (with appropriate error handling): > > rc = setreuid(geteuid(), -1); > rc = sigfillset(&mask); > rc = sigprocmask(SIG_SETMASK, &mask, &oldmask); > > > IOW, set the real uid to the effective UID to ensure that an > unprivileged user can't signal the process if it was run as a setuid > root program and the real UID isn't root. It then masks off all > signals. That leaves SIGKILL by root as a way to interrupt it but > there's really nothing you can do about that. > >> >>> >>>> if (res < 0) >>>> /* Avoid leaving a corrupt mtab file */ >>>> ftruncate(fileno(mfp->mntent_fp), length); >>> >>> > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 17:30 ` Steve Dickson @ 2011-10-19 17:36 ` J. Bruce Fields 2011-10-19 18:38 ` Steve Dickson 2011-10-19 17:40 ` Jeff Layton 1 sibling, 1 reply; 16+ messages in thread From: J. Bruce Fields @ 2011-10-19 17:36 UTC (permalink / raw) To: Steve Dickson; +Cc: Jeff Layton, Linux NFS Mailing list On Wed, Oct 19, 2011 at 01:30:58PM -0400, Steve Dickson wrote: > > > On 10/19/2011 01:22 PM, Jeff Layton wrote: > > On Wed, 19 Oct 2011 13:10:19 -0400 > > Steve Dickson <SteveD@redhat.com> wrote: > > > >> > >> > >> On 10/19/2011 12:36 PM, Jeff Layton wrote: > >>> On Wed, 19 Oct 2011 11:34:30 -0400 > >>> Steve Dickson <steved@redhat.com> wrote: > >>> > >>>> This patch is a following on to commit 7a802337. Using the > >>>> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 > >>>> caused the fflush() and fclose() to fail in turn causing > >>>> corruption in the mtab. > >>>> > >>>> The failures were in the internals of both calls. Switch those > >>>> calls with the actual system calls eliminated the failures. > >>>> > >>>> Signed-off-by: Steve Dickson <steved@redhat.com> > >>>> --- > >>>> support/nfs/nfs_mntent.c | 4 ++-- > >>>> 1 files changed, 2 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c > >>>> index a2118a2..b80f270 100644 > >>>> --- a/support/nfs/nfs_mntent.c > >>>> +++ b/support/nfs/nfs_mntent.c > >>>> @@ -117,7 +117,7 @@ void > >>>> nfs_endmntent (mntFILE *mfp) { > >>>> if (mfp) { > >>>> if (mfp->mntent_fp) > >>>> - fclose(mfp->mntent_fp); > >>>> + close(fileno(mfp->mntent_fp)); > >>>> if (mfp->mntent_file) > >>>> free(mfp->mntent_file); > >>>> free(mfp); > >>>> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { > >>>> free(m3); > >>>> free(m4); > >>>> if (res >= 0) { > >>>> - res = fflush(mfp->mntent_fp); > >>>> + res = fsync(fileno(mfp->mntent_fp)); > >>> > >>> fsync doesn't imply an fflush. With this, I think you may end up > >>> without everything being committed to disk if part or all of it is > >>> still in the file stream buffer. You probably want to do an fflush() > >>> and then an fsync here. > >> The problem was with the fflush() call. The call was causing the > >> mount to drop core in turn causing mtab corruption. Changing that > >> call to a fsync() worked just fine... no corruption... every time! > >> > > > > Ahh, then you have another problem here too then. Most likely it was > > crashing because it caught a SIGXFSZ. Writing out the mtab should not > > be affected by signals. > So calling fflush() generates a SIGXFSZ and call fsync() does not... fflush() must hit this because it's calling write() to write out the stream buffer.... But lock_mtab() should have set SIGXFSZ to be ignored; is that not happening? > I really don't see what the problem is is call simply calling fsync() > which clearly works? We want to make sure the problem's really fixed. --b. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 17:36 ` J. Bruce Fields @ 2011-10-19 18:38 ` Steve Dickson 2011-10-19 19:55 ` J. Bruce Fields 0 siblings, 1 reply; 16+ messages in thread From: Steve Dickson @ 2011-10-19 18:38 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Jeff Layton, Linux NFS Mailing list On 10/19/2011 01:36 PM, J. Bruce Fields wrote: > On Wed, Oct 19, 2011 at 01:30:58PM -0400, Steve Dickson wrote: >> >> >> On 10/19/2011 01:22 PM, Jeff Layton wrote: >>> On Wed, 19 Oct 2011 13:10:19 -0400 >>> Steve Dickson <SteveD@redhat.com> wrote: >>> >>>> >>>> >>>> On 10/19/2011 12:36 PM, Jeff Layton wrote: >>>>> On Wed, 19 Oct 2011 11:34:30 -0400 >>>>> Steve Dickson <steved@redhat.com> wrote: >>>>> >>>>>> This patch is a following on to commit 7a802337. Using the >>>>>> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 >>>>>> caused the fflush() and fclose() to fail in turn causing >>>>>> corruption in the mtab. >>>>>> >>>>>> The failures were in the internals of both calls. Switch those >>>>>> calls with the actual system calls eliminated the failures. >>>>>> >>>>>> Signed-off-by: Steve Dickson <steved@redhat.com> >>>>>> --- >>>>>> support/nfs/nfs_mntent.c | 4 ++-- >>>>>> 1 files changed, 2 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c >>>>>> index a2118a2..b80f270 100644 >>>>>> --- a/support/nfs/nfs_mntent.c >>>>>> +++ b/support/nfs/nfs_mntent.c >>>>>> @@ -117,7 +117,7 @@ void >>>>>> nfs_endmntent (mntFILE *mfp) { >>>>>> if (mfp) { >>>>>> if (mfp->mntent_fp) >>>>>> - fclose(mfp->mntent_fp); >>>>>> + close(fileno(mfp->mntent_fp)); >>>>>> if (mfp->mntent_file) >>>>>> free(mfp->mntent_file); >>>>>> free(mfp); >>>>>> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { >>>>>> free(m3); >>>>>> free(m4); >>>>>> if (res >= 0) { >>>>>> - res = fflush(mfp->mntent_fp); >>>>>> + res = fsync(fileno(mfp->mntent_fp)); >>>>> >>>>> fsync doesn't imply an fflush. With this, I think you may end up >>>>> without everything being committed to disk if part or all of it is >>>>> still in the file stream buffer. You probably want to do an fflush() >>>>> and then an fsync here. >>>> The problem was with the fflush() call. The call was causing the >>>> mount to drop core in turn causing mtab corruption. Changing that >>>> call to a fsync() worked just fine... no corruption... every time! >>>> >>> >>> Ahh, then you have another problem here too then. Most likely it was >>> crashing because it caught a SIGXFSZ. Writing out the mtab should not >>> be affected by signals. >> So calling fflush() generates a SIGXFSZ and call fsync() does not... > > fflush() must hit this because it's calling write() to write out the > stream buffer.... > > But lock_mtab() should have set SIGXFSZ to be ignored; is that not > happening? This is the case. The following patch cause SIGXFSZ to be ignored. diff -up ./utils/mount/fstab.c.orig ./utils/mount/fstab.c --- ./utils/mount/fstab.c.orig 2011-10-19 13:28:57.318132000 -0400 +++ ./utils/mount/fstab.c 2011-10-19 14:02:07.715039000 -0400 @@ -387,8 +387,9 @@ lock_mtab (void) { sa.sa_flags = 0; sigfillset (&sa.sa_mask); - while (sigismember (&sa.sa_mask, ++sig) != -1 - && sig != SIGCHLD) { + while (sigismember (&sa.sa_mask, ++sig) != -1) { + if (sig == SIGCHLD) + continue; if (sig == SIGALRM) sa.sa_handler = setlkw_timeout; else Now all this does is cause mount to ignore the fact that the write() during the fflush() fails, which causes the following warning during the umount. [mntent]: warning: no final newline at the end of /etc/mtab [mntent]: warning: no final newline at the end of /etc/mtab [mntent]: warning: no final newline at the end of /etc/mtab I can live with those warnings.... steved. > >> I really don't see what the problem is is call simply calling fsync() >> which clearly works? > > We want to make sure the problem's really fixed. > > --b. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 18:38 ` Steve Dickson @ 2011-10-19 19:55 ` J. Bruce Fields 2011-10-19 20:00 ` Steve Dickson 0 siblings, 1 reply; 16+ messages in thread From: J. Bruce Fields @ 2011-10-19 19:55 UTC (permalink / raw) To: Steve Dickson; +Cc: Jeff Layton, Linux NFS Mailing list On Wed, Oct 19, 2011 at 02:38:49PM -0400, Steve Dickson wrote: > > > On 10/19/2011 01:36 PM, J. Bruce Fields wrote: > > On Wed, Oct 19, 2011 at 01:30:58PM -0400, Steve Dickson wrote: > >> > >> > >> On 10/19/2011 01:22 PM, Jeff Layton wrote: > >>> On Wed, 19 Oct 2011 13:10:19 -0400 > >>> Steve Dickson <SteveD@redhat.com> wrote: > >>> > >>>> > >>>> > >>>> On 10/19/2011 12:36 PM, Jeff Layton wrote: > >>>>> On Wed, 19 Oct 2011 11:34:30 -0400 > >>>>> Steve Dickson <steved@redhat.com> wrote: > >>>>> > >>>>>> This patch is a following on to commit 7a802337. Using the > >>>>>> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 > >>>>>> caused the fflush() and fclose() to fail in turn causing > >>>>>> corruption in the mtab. > >>>>>> > >>>>>> The failures were in the internals of both calls. Switch those > >>>>>> calls with the actual system calls eliminated the failures. > >>>>>> > >>>>>> Signed-off-by: Steve Dickson <steved@redhat.com> > >>>>>> --- > >>>>>> support/nfs/nfs_mntent.c | 4 ++-- > >>>>>> 1 files changed, 2 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c > >>>>>> index a2118a2..b80f270 100644 > >>>>>> --- a/support/nfs/nfs_mntent.c > >>>>>> +++ b/support/nfs/nfs_mntent.c > >>>>>> @@ -117,7 +117,7 @@ void > >>>>>> nfs_endmntent (mntFILE *mfp) { > >>>>>> if (mfp) { > >>>>>> if (mfp->mntent_fp) > >>>>>> - fclose(mfp->mntent_fp); > >>>>>> + close(fileno(mfp->mntent_fp)); > >>>>>> if (mfp->mntent_file) > >>>>>> free(mfp->mntent_file); > >>>>>> free(mfp); > >>>>>> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { > >>>>>> free(m3); > >>>>>> free(m4); > >>>>>> if (res >= 0) { > >>>>>> - res = fflush(mfp->mntent_fp); > >>>>>> + res = fsync(fileno(mfp->mntent_fp)); > >>>>> > >>>>> fsync doesn't imply an fflush. With this, I think you may end up > >>>>> without everything being committed to disk if part or all of it is > >>>>> still in the file stream buffer. You probably want to do an fflush() > >>>>> and then an fsync here. > >>>> The problem was with the fflush() call. The call was causing the > >>>> mount to drop core in turn causing mtab corruption. Changing that > >>>> call to a fsync() worked just fine... no corruption... every time! > >>>> > >>> > >>> Ahh, then you have another problem here too then. Most likely it was > >>> crashing because it caught a SIGXFSZ. Writing out the mtab should not > >>> be affected by signals. > >> So calling fflush() generates a SIGXFSZ and call fsync() does not... > > > > fflush() must hit this because it's calling write() to write out the > > stream buffer.... > > > > But lock_mtab() should have set SIGXFSZ to be ignored; is that not > > happening? > This is the case. The following patch cause SIGXFSZ to be ignored. > > diff -up ./utils/mount/fstab.c.orig ./utils/mount/fstab.c > --- ./utils/mount/fstab.c.orig 2011-10-19 13:28:57.318132000 -0400 > +++ ./utils/mount/fstab.c 2011-10-19 14:02:07.715039000 -0400 > @@ -387,8 +387,9 @@ lock_mtab (void) { > sa.sa_flags = 0; > sigfillset (&sa.sa_mask); > > - while (sigismember (&sa.sa_mask, ++sig) != -1 > - && sig != SIGCHLD) { I'm very confused--the latest nfs-utils doesn't have this. Oh, I see: it was fixed by b3e190c4adfc9ec47567c968bd000d282d07b05e "mount: improve signal management when locking mtab". So if you're working with an older version then you probably want to backport that patch. --b. > + while (sigismember (&sa.sa_mask, ++sig) != -1) { > + if (sig == SIGCHLD) > + continue; > if (sig == SIGALRM) > sa.sa_handler = setlkw_timeout; > else > > Now all this does is cause mount to ignore the fact that > the write() during the fflush() fails, which causes the > following warning during the umount. > > [mntent]: warning: no final newline at the end of /etc/mtab > [mntent]: warning: no final newline at the end of /etc/mtab > [mntent]: warning: no final newline at the end of /etc/mtab > > I can live with those warnings.... > > steved. > > > > >> I really don't see what the problem is is call simply calling fsync() > >> which clearly works? > > > > We want to make sure the problem's really fixed. > > > > --b. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 19:55 ` J. Bruce Fields @ 2011-10-19 20:00 ` Steve Dickson 2011-10-19 20:01 ` J. Bruce Fields 0 siblings, 1 reply; 16+ messages in thread From: Steve Dickson @ 2011-10-19 20:00 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Jeff Layton, Linux NFS Mailing list On 10/19/2011 03:55 PM, J. Bruce Fields wrote: > On Wed, Oct 19, 2011 at 02:38:49PM -0400, Steve Dickson wrote: >> >> >> On 10/19/2011 01:36 PM, J. Bruce Fields wrote: >>> On Wed, Oct 19, 2011 at 01:30:58PM -0400, Steve Dickson wrote: >>>> >>>> >>>> On 10/19/2011 01:22 PM, Jeff Layton wrote: >>>>> On Wed, 19 Oct 2011 13:10:19 -0400 >>>>> Steve Dickson <SteveD@redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On 10/19/2011 12:36 PM, Jeff Layton wrote: >>>>>>> On Wed, 19 Oct 2011 11:34:30 -0400 >>>>>>> Steve Dickson <steved@redhat.com> wrote: >>>>>>> >>>>>>>> This patch is a following on to commit 7a802337. Using the >>>>>>>> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 >>>>>>>> caused the fflush() and fclose() to fail in turn causing >>>>>>>> corruption in the mtab. >>>>>>>> >>>>>>>> The failures were in the internals of both calls. Switch those >>>>>>>> calls with the actual system calls eliminated the failures. >>>>>>>> >>>>>>>> Signed-off-by: Steve Dickson <steved@redhat.com> >>>>>>>> --- >>>>>>>> support/nfs/nfs_mntent.c | 4 ++-- >>>>>>>> 1 files changed, 2 insertions(+), 2 deletions(-) >>>>>>>> >>>>>>>> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c >>>>>>>> index a2118a2..b80f270 100644 >>>>>>>> --- a/support/nfs/nfs_mntent.c >>>>>>>> +++ b/support/nfs/nfs_mntent.c >>>>>>>> @@ -117,7 +117,7 @@ void >>>>>>>> nfs_endmntent (mntFILE *mfp) { >>>>>>>> if (mfp) { >>>>>>>> if (mfp->mntent_fp) >>>>>>>> - fclose(mfp->mntent_fp); >>>>>>>> + close(fileno(mfp->mntent_fp)); >>>>>>>> if (mfp->mntent_file) >>>>>>>> free(mfp->mntent_file); >>>>>>>> free(mfp); >>>>>>>> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { >>>>>>>> free(m3); >>>>>>>> free(m4); >>>>>>>> if (res >= 0) { >>>>>>>> - res = fflush(mfp->mntent_fp); >>>>>>>> + res = fsync(fileno(mfp->mntent_fp)); >>>>>>> >>>>>>> fsync doesn't imply an fflush. With this, I think you may end up >>>>>>> without everything being committed to disk if part or all of it is >>>>>>> still in the file stream buffer. You probably want to do an fflush() >>>>>>> and then an fsync here. >>>>>> The problem was with the fflush() call. The call was causing the >>>>>> mount to drop core in turn causing mtab corruption. Changing that >>>>>> call to a fsync() worked just fine... no corruption... every time! >>>>>> >>>>> >>>>> Ahh, then you have another problem here too then. Most likely it was >>>>> crashing because it caught a SIGXFSZ. Writing out the mtab should not >>>>> be affected by signals. >>>> So calling fflush() generates a SIGXFSZ and call fsync() does not... >>> >>> fflush() must hit this because it's calling write() to write out the >>> stream buffer.... >>> >>> But lock_mtab() should have set SIGXFSZ to be ignored; is that not >>> happening? >> This is the case. The following patch cause SIGXFSZ to be ignored. >> >> diff -up ./utils/mount/fstab.c.orig ./utils/mount/fstab.c >> --- ./utils/mount/fstab.c.orig 2011-10-19 13:28:57.318132000 -0400 >> +++ ./utils/mount/fstab.c 2011-10-19 14:02:07.715039000 -0400 >> @@ -387,8 +387,9 @@ lock_mtab (void) { >> sa.sa_flags = 0; >> sigfillset (&sa.sa_mask); >> >> - while (sigismember (&sa.sa_mask, ++sig) != -1 >> - && sig != SIGCHLD) { > > I'm very confused--the latest nfs-utils doesn't have this. > > Oh, I see: it was fixed by b3e190c4adfc9ec47567c968bd000d282d07b05e > "mount: improve signal management when locking mtab". > > So if you're working with an older version then you probably want to > backport that patch. Exactly... Read my last posting... steved. > > --b. > >> + while (sigismember (&sa.sa_mask, ++sig) != -1) { >> + if (sig == SIGCHLD) >> + continue; >> if (sig == SIGALRM) >> sa.sa_handler = setlkw_timeout; >> else >> >> Now all this does is cause mount to ignore the fact that >> the write() during the fflush() fails, which causes the >> following warning during the umount. >> >> [mntent]: warning: no final newline at the end of /etc/mtab >> [mntent]: warning: no final newline at the end of /etc/mtab >> [mntent]: warning: no final newline at the end of /etc/mtab >> >> I can live with those warnings.... >> >> steved. >> >>> >>>> I really don't see what the problem is is call simply calling fsync() >>>> which clearly works? >>> >>> We want to make sure the problem's really fixed. >>> >>> --b. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 20:00 ` Steve Dickson @ 2011-10-19 20:01 ` J. Bruce Fields 0 siblings, 0 replies; 16+ messages in thread From: J. Bruce Fields @ 2011-10-19 20:01 UTC (permalink / raw) To: Steve Dickson; +Cc: Jeff Layton, Linux NFS Mailing list On Wed, Oct 19, 2011 at 04:00:31PM -0400, Steve Dickson wrote: > > > On 10/19/2011 03:55 PM, J. Bruce Fields wrote: > > On Wed, Oct 19, 2011 at 02:38:49PM -0400, Steve Dickson wrote: > >> This is the case. The following patch cause SIGXFSZ to be ignored. > >> > >> diff -up ./utils/mount/fstab.c.orig ./utils/mount/fstab.c > >> --- ./utils/mount/fstab.c.orig 2011-10-19 13:28:57.318132000 -0400 > >> +++ ./utils/mount/fstab.c 2011-10-19 14:02:07.715039000 -0400 > >> @@ -387,8 +387,9 @@ lock_mtab (void) { > >> sa.sa_flags = 0; > >> sigfillset (&sa.sa_mask); > >> > >> - while (sigismember (&sa.sa_mask, ++sig) != -1 > >> - && sig != SIGCHLD) { > > > > I'm very confused--the latest nfs-utils doesn't have this. > > > > Oh, I see: it was fixed by b3e190c4adfc9ec47567c968bd000d282d07b05e > > "mount: improve signal management when locking mtab". > > > > So if you're working with an older version then you probably want to > > backport that patch. > Exactly... Read my last posting... I'm way behind! Thanks--b. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 17:30 ` Steve Dickson 2011-10-19 17:36 ` J. Bruce Fields @ 2011-10-19 17:40 ` Jeff Layton 2011-10-19 18:00 ` Steve Dickson 1 sibling, 1 reply; 16+ messages in thread From: Jeff Layton @ 2011-10-19 17:40 UTC (permalink / raw) To: Steve Dickson; +Cc: Linux NFS Mailing list On Wed, 19 Oct 2011 13:30:58 -0400 Steve Dickson <SteveD@redhat.com> wrote: > > > On 10/19/2011 01:22 PM, Jeff Layton wrote: > > On Wed, 19 Oct 2011 13:10:19 -0400 > > Steve Dickson <SteveD@redhat.com> wrote: > > > >> > >> > >> On 10/19/2011 12:36 PM, Jeff Layton wrote: > >>> On Wed, 19 Oct 2011 11:34:30 -0400 > >>> Steve Dickson <steved@redhat.com> wrote: > >>> > >>>> This patch is a following on to commit 7a802337. Using the > >>>> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 > >>>> caused the fflush() and fclose() to fail in turn causing > >>>> corruption in the mtab. > >>>> > >>>> The failures were in the internals of both calls. Switch those > >>>> calls with the actual system calls eliminated the failures. > >>>> > >>>> Signed-off-by: Steve Dickson <steved@redhat.com> > >>>> --- > >>>> support/nfs/nfs_mntent.c | 4 ++-- > >>>> 1 files changed, 2 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c > >>>> index a2118a2..b80f270 100644 > >>>> --- a/support/nfs/nfs_mntent.c > >>>> +++ b/support/nfs/nfs_mntent.c > >>>> @@ -117,7 +117,7 @@ void > >>>> nfs_endmntent (mntFILE *mfp) { > >>>> if (mfp) { > >>>> if (mfp->mntent_fp) > >>>> - fclose(mfp->mntent_fp); > >>>> + close(fileno(mfp->mntent_fp)); > >>>> if (mfp->mntent_file) > >>>> free(mfp->mntent_file); > >>>> free(mfp); > >>>> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { > >>>> free(m3); > >>>> free(m4); > >>>> if (res >= 0) { > >>>> - res = fflush(mfp->mntent_fp); > >>>> + res = fsync(fileno(mfp->mntent_fp)); > >>> > >>> fsync doesn't imply an fflush. With this, I think you may end up > >>> without everything being committed to disk if part or all of it is > >>> still in the file stream buffer. You probably want to do an fflush() > >>> and then an fsync here. > >> The problem was with the fflush() call. The call was causing the > >> mount to drop core in turn causing mtab corruption. Changing that > >> call to a fsync() worked just fine... no corruption... every time! > >> > > > > Ahh, then you have another problem here too then. Most likely it was > > crashing because it caught a SIGXFSZ. Writing out the mtab should not > > be affected by signals. > So calling fflush() generates a SIGXFSZ and call fsync() does not... > I really don't see what the problem is is call simply calling fsync() > which clearly works? > Because they do different things. fflush() flushes out the (userspace) stream buffer to the kernel's pagecache. fsync() forces the pagecache to be written out and committed to disk. If you call fsync() while the data is still in the userspace buffer then it doesn't really do you any good. The data you wrote via fwrite() won't be committed to disk. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 17:40 ` Jeff Layton @ 2011-10-19 18:00 ` Steve Dickson 0 siblings, 0 replies; 16+ messages in thread From: Steve Dickson @ 2011-10-19 18:00 UTC (permalink / raw) To: Jeff Layton; +Cc: Linux NFS Mailing list On 10/19/2011 01:40 PM, Jeff Layton wrote: > On Wed, 19 Oct 2011 13:30:58 -0400 > Steve Dickson <SteveD@redhat.com> wrote: > >> >> >> On 10/19/2011 01:22 PM, Jeff Layton wrote: >>> On Wed, 19 Oct 2011 13:10:19 -0400 >>> Steve Dickson <SteveD@redhat.com> wrote: >>> >>>> >>>> >>>> On 10/19/2011 12:36 PM, Jeff Layton wrote: >>>>> On Wed, 19 Oct 2011 11:34:30 -0400 >>>>> Steve Dickson <steved@redhat.com> wrote: >>>>> >>>>>> This patch is a following on to commit 7a802337. Using the >>>>>> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 >>>>>> caused the fflush() and fclose() to fail in turn causing >>>>>> corruption in the mtab. >>>>>> >>>>>> The failures were in the internals of both calls. Switch those >>>>>> calls with the actual system calls eliminated the failures. >>>>>> >>>>>> Signed-off-by: Steve Dickson <steved@redhat.com> >>>>>> --- >>>>>> support/nfs/nfs_mntent.c | 4 ++-- >>>>>> 1 files changed, 2 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c >>>>>> index a2118a2..b80f270 100644 >>>>>> --- a/support/nfs/nfs_mntent.c >>>>>> +++ b/support/nfs/nfs_mntent.c >>>>>> @@ -117,7 +117,7 @@ void >>>>>> nfs_endmntent (mntFILE *mfp) { >>>>>> if (mfp) { >>>>>> if (mfp->mntent_fp) >>>>>> - fclose(mfp->mntent_fp); >>>>>> + close(fileno(mfp->mntent_fp)); >>>>>> if (mfp->mntent_file) >>>>>> free(mfp->mntent_file); >>>>>> free(mfp); >>>>>> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { >>>>>> free(m3); >>>>>> free(m4); >>>>>> if (res >= 0) { >>>>>> - res = fflush(mfp->mntent_fp); >>>>>> + res = fsync(fileno(mfp->mntent_fp)); >>>>> >>>>> fsync doesn't imply an fflush. With this, I think you may end up >>>>> without everything being committed to disk if part or all of it is >>>>> still in the file stream buffer. You probably want to do an fflush() >>>>> and then an fsync here. >>>> The problem was with the fflush() call. The call was causing the >>>> mount to drop core in turn causing mtab corruption. Changing that >>>> call to a fsync() worked just fine... no corruption... every time! >>>> >>> >>> Ahh, then you have another problem here too then. Most likely it was >>> crashing because it caught a SIGXFSZ. Writing out the mtab should not >>> be affected by signals. >> So calling fflush() generates a SIGXFSZ and call fsync() does not... >> I really don't see what the problem is is call simply calling fsync() >> which clearly works? >> > > Because they do different things. fflush() flushes out the (userspace) > stream buffer to the kernel's pagecache. fsync() forces the pagecache > to be written out and committed to disk. Believe I understand the code...... > > If you call fsync() while the data is still in the userspace buffer > then it doesn't really do you any good. The data you wrote via fwrite() > won't be committed to disk. > So I'm thinking the reason the data makes it the disk is its already in the pagecache when the fsync() is called... but to agree with Bruce, the might just be luck... steved. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 17:10 ` Steve Dickson 2011-10-19 17:22 ` Jeff Layton @ 2011-10-19 17:28 ` J. Bruce Fields 2011-10-19 17:32 ` Steve Dickson 1 sibling, 1 reply; 16+ messages in thread From: J. Bruce Fields @ 2011-10-19 17:28 UTC (permalink / raw) To: Steve Dickson; +Cc: Jeff Layton, Linux NFS Mailing list On Wed, Oct 19, 2011 at 01:10:19PM -0400, Steve Dickson wrote: > > > On 10/19/2011 12:36 PM, Jeff Layton wrote: > > On Wed, 19 Oct 2011 11:34:30 -0400 > > Steve Dickson <steved@redhat.com> wrote: > > > >> This patch is a following on to commit 7a802337. Using the > >> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 > >> caused the fflush() and fclose() to fail in turn causing > >> corruption in the mtab. > >> > >> The failures were in the internals of both calls. Switch those > >> calls with the actual system calls eliminated the failures. > >> > >> Signed-off-by: Steve Dickson <steved@redhat.com> > >> --- > >> support/nfs/nfs_mntent.c | 4 ++-- > >> 1 files changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c > >> index a2118a2..b80f270 100644 > >> --- a/support/nfs/nfs_mntent.c > >> +++ b/support/nfs/nfs_mntent.c > >> @@ -117,7 +117,7 @@ void > >> nfs_endmntent (mntFILE *mfp) { > >> if (mfp) { > >> if (mfp->mntent_fp) > >> - fclose(mfp->mntent_fp); > >> + close(fileno(mfp->mntent_fp)); > >> if (mfp->mntent_file) > >> free(mfp->mntent_file); > >> free(mfp); > >> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { > >> free(m3); > >> free(m4); > >> if (res >= 0) { > >> - res = fflush(mfp->mntent_fp); > >> + res = fsync(fileno(mfp->mntent_fp)); > > > > fsync doesn't imply an fflush. With this, I think you may end up > > without everything being committed to disk if part or all of it is > > still in the file stream buffer. You probably want to do an fflush() > > and then an fsync here. > The problem was with the fflush() call. The call was causing the > mount to drop core in turn causing mtab corruption. Changing that > call to a fsync() worked just fine... no corruption... every time! Looking at man setrlimit.... If you're worried about RLIMIT_FSIZE, then I guess you need to handle SIGXFSZ and EFBIG returns from write(). It looks like lock_mtab() already sets SIGXFSZ to be ignored. Is a libc bug crashing fflush on EFBIG? Or maybe the ftruncate is a problem? Maybe you just want to do away with stream IO completely? --b. > > steved. > > > > >> if (res < 0) > >> /* Avoid leaving a corrupt mtab file */ > >> ftruncate(fileno(mfp->mntent_fp), length); > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 17:28 ` J. Bruce Fields @ 2011-10-19 17:32 ` Steve Dickson 2011-10-19 17:39 ` J. Bruce Fields 0 siblings, 1 reply; 16+ messages in thread From: Steve Dickson @ 2011-10-19 17:32 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Jeff Layton, Linux NFS Mailing list On 10/19/2011 01:28 PM, J. Bruce Fields wrote: > On Wed, Oct 19, 2011 at 01:10:19PM -0400, Steve Dickson wrote: >> >> >> On 10/19/2011 12:36 PM, Jeff Layton wrote: >>> On Wed, 19 Oct 2011 11:34:30 -0400 >>> Steve Dickson <steved@redhat.com> wrote: >>> >>>> This patch is a following on to commit 7a802337. Using the >>>> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 >>>> caused the fflush() and fclose() to fail in turn causing >>>> corruption in the mtab. >>>> >>>> The failures were in the internals of both calls. Switch those >>>> calls with the actual system calls eliminated the failures. >>>> >>>> Signed-off-by: Steve Dickson <steved@redhat.com> >>>> --- >>>> support/nfs/nfs_mntent.c | 4 ++-- >>>> 1 files changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c >>>> index a2118a2..b80f270 100644 >>>> --- a/support/nfs/nfs_mntent.c >>>> +++ b/support/nfs/nfs_mntent.c >>>> @@ -117,7 +117,7 @@ void >>>> nfs_endmntent (mntFILE *mfp) { >>>> if (mfp) { >>>> if (mfp->mntent_fp) >>>> - fclose(mfp->mntent_fp); >>>> + close(fileno(mfp->mntent_fp)); >>>> if (mfp->mntent_file) >>>> free(mfp->mntent_file); >>>> free(mfp); >>>> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { >>>> free(m3); >>>> free(m4); >>>> if (res >= 0) { >>>> - res = fflush(mfp->mntent_fp); >>>> + res = fsync(fileno(mfp->mntent_fp)); >>> >>> fsync doesn't imply an fflush. With this, I think you may end up >>> without everything being committed to disk if part or all of it is >>> still in the file stream buffer. You probably want to do an fflush() >>> and then an fsync here. >> The problem was with the fflush() call. The call was causing the >> mount to drop core in turn causing mtab corruption. Changing that >> call to a fsync() worked just fine... no corruption... every time! > > Looking at man setrlimit.... If you're worried about RLIMIT_FSIZE, then > I guess you need to handle SIGXFSZ and EFBIG returns from write(). > > It looks like lock_mtab() already sets SIGXFSZ to be ignored. Is a libc > bug crashing fflush on EFBIG? Or maybe the ftruncate is a problem? > Maybe you just want to do away with stream IO completely? Calling fsync() clearly works. So why are looking for alternative solutions?? steved. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 17:32 ` Steve Dickson @ 2011-10-19 17:39 ` J. Bruce Fields 0 siblings, 0 replies; 16+ messages in thread From: J. Bruce Fields @ 2011-10-19 17:39 UTC (permalink / raw) To: Steve Dickson; +Cc: Jeff Layton, Linux NFS Mailing list On Wed, Oct 19, 2011 at 01:32:41PM -0400, Steve Dickson wrote: > > > On 10/19/2011 01:28 PM, J. Bruce Fields wrote: > > On Wed, Oct 19, 2011 at 01:10:19PM -0400, Steve Dickson wrote: > >> > >> > >> On 10/19/2011 12:36 PM, Jeff Layton wrote: > >>> On Wed, 19 Oct 2011 11:34:30 -0400 > >>> Steve Dickson <steved@redhat.com> wrote: > >>> > >>>> This patch is a following on to commit 7a802337. Using the > >>>> tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 > >>>> caused the fflush() and fclose() to fail in turn causing > >>>> corruption in the mtab. > >>>> > >>>> The failures were in the internals of both calls. Switch those > >>>> calls with the actual system calls eliminated the failures. > >>>> > >>>> Signed-off-by: Steve Dickson <steved@redhat.com> > >>>> --- > >>>> support/nfs/nfs_mntent.c | 4 ++-- > >>>> 1 files changed, 2 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c > >>>> index a2118a2..b80f270 100644 > >>>> --- a/support/nfs/nfs_mntent.c > >>>> +++ b/support/nfs/nfs_mntent.c > >>>> @@ -117,7 +117,7 @@ void > >>>> nfs_endmntent (mntFILE *mfp) { > >>>> if (mfp) { > >>>> if (mfp->mntent_fp) > >>>> - fclose(mfp->mntent_fp); > >>>> + close(fileno(mfp->mntent_fp)); > >>>> if (mfp->mntent_file) > >>>> free(mfp->mntent_file); > >>>> free(mfp); > >>>> @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { > >>>> free(m3); > >>>> free(m4); > >>>> if (res >= 0) { > >>>> - res = fflush(mfp->mntent_fp); > >>>> + res = fsync(fileno(mfp->mntent_fp)); > >>> > >>> fsync doesn't imply an fflush. With this, I think you may end up > >>> without everything being committed to disk if part or all of it is > >>> still in the file stream buffer. You probably want to do an fflush() > >>> and then an fsync here. > >> The problem was with the fflush() call. The call was causing the > >> mount to drop core in turn causing mtab corruption. Changing that > >> call to a fsync() worked just fine... no corruption... every time! > > > > Looking at man setrlimit.... If you're worried about RLIMIT_FSIZE, then > > I guess you need to handle SIGXFSZ and EFBIG returns from write(). > > > > It looks like lock_mtab() already sets SIGXFSZ to be ignored. Is a libc > > bug crashing fflush on EFBIG? Or maybe the ftruncate is a problem? > > Maybe you just want to do away with stream IO completely? > Calling fsync() clearly works. Until we understand why it works, it may be just luck. --b. > So why are looking for alternative > solutions?? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write 2011-10-19 15:34 [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write Steve Dickson 2011-10-19 16:36 ` Jeff Layton @ 2011-10-19 19:44 ` Steve Dickson 1 sibling, 0 replies; 16+ messages in thread From: Steve Dickson @ 2011-10-19 19:44 UTC (permalink / raw) To: Steve Dickson; +Cc: Linux NFS Mailing list It turns out this is only a RHEL6 bug. I (stupidly) made the assumption that since the RHEL6 patch cleanly applied to upstream, both streams worked in the same way... which is not the case. With upstream, this mtab code is not even compiled when the libmount code is enabled. Secondly, with later systems /etc/mtab is symbolically linked to /proc/mounts, making the file read-only. Finally when /etc/mtabs is a regular file and is writeable, the SIGXFSZ signal is handled properly in lock_mtab() so there is no corruption. Sorry for the noise. steved. On 10/19/2011 11:34 AM, Steve Dickson wrote: > This patch is a following on to commit 7a802337. Using the > tool in https://bugzilla.redhat.com/show_bug.cgi?id=695916 > caused the fflush() and fclose() to fail in turn causing > corruption in the mtab. > > The failures were in the internals of both calls. Switch those > calls with the actual system calls eliminated the failures. > > Signed-off-by: Steve Dickson <steved@redhat.com> > --- > support/nfs/nfs_mntent.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/support/nfs/nfs_mntent.c b/support/nfs/nfs_mntent.c > index a2118a2..b80f270 100644 > --- a/support/nfs/nfs_mntent.c > +++ b/support/nfs/nfs_mntent.c > @@ -117,7 +117,7 @@ void > nfs_endmntent (mntFILE *mfp) { > if (mfp) { > if (mfp->mntent_fp) > - fclose(mfp->mntent_fp); > + close(fileno(mfp->mntent_fp)); > if (mfp->mntent_file) > free(mfp->mntent_file); > free(mfp); > @@ -147,7 +147,7 @@ nfs_addmntent (mntFILE *mfp, struct mntent *mnt) { > free(m3); > free(m4); > if (res >= 0) { > - res = fflush(mfp->mntent_fp); > + res = fsync(fileno(mfp->mntent_fp)); > if (res < 0) > /* Avoid leaving a corrupt mtab file */ > ftruncate(fileno(mfp->mntent_fp), length); ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-10-19 20:01 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-10-19 15:34 [PATCH 1/1] mount.nfs: mtab corruption when RLIMIT_FSIZE causes a partial write Steve Dickson 2011-10-19 16:36 ` Jeff Layton 2011-10-19 17:10 ` Steve Dickson 2011-10-19 17:22 ` Jeff Layton 2011-10-19 17:30 ` Steve Dickson 2011-10-19 17:36 ` J. Bruce Fields 2011-10-19 18:38 ` Steve Dickson 2011-10-19 19:55 ` J. Bruce Fields 2011-10-19 20:00 ` Steve Dickson 2011-10-19 20:01 ` J. Bruce Fields 2011-10-19 17:40 ` Jeff Layton 2011-10-19 18:00 ` Steve Dickson 2011-10-19 17:28 ` J. Bruce Fields 2011-10-19 17:32 ` Steve Dickson 2011-10-19 17:39 ` J. Bruce Fields 2011-10-19 19:44 ` Steve Dickson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).