From: "bfields@fieldses.org" <bfields@fieldses.org>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"bfields@redhat.com" <bfields@redhat.com>
Subject: Re: [PATCH 0/3] Handling NFSv3 I/O errors in knfsd
Date: Mon, 26 Aug 2019 20:48:11 -0400 [thread overview]
Message-ID: <20190827004811.GA30827@fieldses.org> (raw)
In-Reply-To: <ef9f2791ef395d7c968a386ce0a32ea503d6478f.camel@hammerspace.com>
On Mon, Aug 26, 2019 at 09:02:31PM +0000, Trond Myklebust wrote:
> On Mon, 2019-08-26 at 16:51 -0400, J. Bruce Fields wrote:
> > On Mon, Aug 26, 2019 at 12:50:18PM -0400, Trond Myklebust wrote:
> > > Recently, a number of changes went into the kernel to try to
> > > ensure that I/O errors (specifically write errors) are reported to
> > > the application once and only once. The vehicle for ensuring the
> > > errors are reported is the struct file, which uses the 'f_wb_err'
> > > field to track which errors have been reported.
> > >
> > > The problem is that errors are mainly intended to be reported
> > > through fsync(). If the client is doing synchronous writes, then
> > > all is well, but if it is doing unstable writes, then the errors
> > > may not be reported until the client calls COMMIT. If the file
> > > cache has thrown out the struct file, due to memory pressure, or
> > > just because the client took a long while between the last WRITE
> > > and the COMMIT, then the error report may be lost, and the client
> > > may just think its data is safely stored.
> >
> > These were lost before the file caching patches as well, right? Or
> > is there some regression?
>
> Correct. This is not a regression, but an attempt to fix a problem
> that has existed for some time now.
>
> >
> > > Note that the problem is compounded by the fact that NFSv3 is
> > > stateless, so the server never knows that the client may have
> > > rebooted, so there can be no guarantee that a COMMIT will ever be
> > > sent.
> > >
> > > The following patch set attempts to remedy the situation using 2
> > > strategies:
> > >
> > > 1) If the inode is dirty, then avoid garbage collecting the file
> > > from the file cache. 2) If the file is closed, and we see that it
> > > would have reported an error to COMMIT, then we bump the boot
> > > verifier in order to ensure the client retransmits all its writes.
> >
> > Sounds sensible to me.
> >
> > > Note that if multiple clients were writing to the same file, then
> > > we probably want to bump the boot verifier anyway, since only one
> > > COMMIT will see the error report (because the cached file is also
> > > shared).
> >
> > I'm confused by the "probably should". So that's future work? I
> > guess it'd mean some additional work to identify that case. You
> > can't really even distinguish clients in the NFSv3 case, but I
> > suppose you could use IP address or TCP connection as an
> > approximation.
>
> I'm suggesting we should do this too, but I haven't done so yet in
> these patches. I'd like to hear other opinions (particularly from you,
> Chuck and Jeff).
Does this process actually converge, or do we end up with all the
clients retrying the writes and, again, only one of them getting the
error?
I wonder what the typical errors are, anyway.
--b.
next prev parent reply other threads:[~2019-08-27 0:48 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-26 16:50 [PATCH 0/3] Handling NFSv3 I/O errors in knfsd Trond Myklebust
2019-08-26 16:50 ` [PATCH 1/3] nfsd: nfsd_file cache entries should be per net namespace Trond Myklebust
2019-08-26 16:50 ` [PATCH 2/3] nfsd: Support the server resetting the boot verifier Trond Myklebust
2019-08-26 16:50 ` [PATCH 3/3] nfsd: Don't garbage collect files that might contain write errors Trond Myklebust
2019-08-27 7:58 ` [PATCH 2/3] nfsd: Support the server resetting the boot verifier kbuild test robot
2019-08-26 20:51 ` [PATCH 0/3] Handling NFSv3 I/O errors in knfsd J. Bruce Fields
2019-08-26 21:02 ` Trond Myklebust
2019-08-27 0:48 ` bfields [this message]
2019-08-27 0:56 ` Trond Myklebust
2019-08-27 1:13 ` bfields
2019-08-27 1:28 ` Trond Myklebust
2019-08-27 13:59 ` Chuck Lever
2019-08-27 14:53 ` Trond Myklebust
2019-08-27 14:58 ` bfields
2019-08-27 14:59 ` bfields
2019-08-27 15:15 ` Trond Myklebust
2019-08-27 15:20 ` Chuck Lever
2019-08-28 13:48 ` bfields
2019-08-28 13:51 ` Jeff Layton
2019-08-28 13:57 ` Chuck Lever
2019-08-28 14:00 ` J. Bruce Fields
2019-08-28 14:03 ` Chuck Lever
2019-08-28 14:16 ` Jeff Layton
2019-08-28 14:21 ` Chuck Lever
2019-08-28 14:40 ` J. Bruce Fields
2019-08-28 14:48 ` Bruce Fields
2019-08-28 14:50 ` Chuck Lever
2019-08-28 17:07 ` Bruce Fields
2019-08-28 15:09 ` Jeff Layton
2019-08-28 15:12 ` Rick Macklem
2019-08-28 15:37 ` Trond Myklebust
2019-08-28 15:46 ` Bruce Fields
2019-08-27 14:54 ` Bruce Fields
2019-08-27 14:59 ` Trond Myklebust
2019-08-27 15:00 ` bfields
2019-08-27 15:17 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190827004811.GA30827@fieldses.org \
--to=bfields@fieldses.org \
--cc=bfields@redhat.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.