From: Matthew Wilcox <willy@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Jeff Layton <jlayton@redhat.com>,
lsf-pc <lsf-pc@lists.linuxfoundation.org>,
Andres Freund <andres@anarazel.de>,
Andreas Dilger <adilger@dilger.ca>,
"Theodore Y. Ts'o" <tytso@mit.edu>,
Ext4 Developers List <linux-ext4@vger.kernel.org>,
Linux FS Devel <linux-fsdevel@vger.kernel.org>,
"Joshua D. Drake" <jd@commandprompt.com>
Subject: Re: fsync() errors is unsafe and risks data loss
Date: Fri, 13 Apr 2018 19:38:14 -0700 [thread overview]
Message-ID: <20180414023814.GB997@bombadil.infradead.org> (raw)
In-Reply-To: <20180414014752.GG23861@dastard>
On Sat, Apr 14, 2018 at 11:47:52AM +1000, Dave Chinner wrote:
> On Fri, Apr 13, 2018 at 07:02:32AM -0700, Matthew Wilcox wrote:
> > 1. If we get an error while wbc->for_background is true, we should not clear
> > uptodate on the page, rather SetPageError and SetPageDirty.
>
> So you're saying we should treat it as a transient error rather than
> a permanent error.
Yes, I'm proposing leaving the data in memory in case the user wants to
try writing it somewhere else.
> > 2. Background writebacks should skip pages which are PageError.
>
> That seems decidedly dodgy in the case where there is a transient
> error - it requires a user to specifically run sync to get the data
> to disk after the transient error has occurred. Say they don't
> notice the problem because it's fleeting and doesn't cause any
> obvious problems?
That's fair. What I want to avoid is triggering the same error every
30 seconds (or whatever the periodic writeback threshold is set to).
> e.g. XFS gets to enospc, runs out of reserve pool blocks so can't
> allocate space to write back the page, then space is freed up a few
> seconds later and so the next write will work just fine.
>
> This is a recipe for "I lost data that I wrote /days/ before the
> system crashed" bug reports.
So ... exponential backoff on retries?
> > 3. for_sync writebacks should attempt one last write. Maybe it'll
> > succeed this time. If it does, just ClearPageError. If not, we have
> > somebody to report this writeback error to, and ClearPageUptodate.
>
> Which may well be unmount. Are we really going to wait until unmount
> to report fatal errors?
Goodness, no. The errors would be immediately reportable using the wb_err
mechanism, as soon as the first error was encountered.
next prev parent reply other threads:[~2018-04-14 2:38 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-10 22:07 fsync() errors is unsafe and risks data loss Andres Freund
2018-04-11 21:52 ` Andreas Dilger
2018-04-12 0:09 ` Dave Chinner
2018-04-12 2:32 ` Andres Freund
2018-04-12 2:51 ` Andres Freund
2018-04-12 5:09 ` Theodore Y. Ts'o
2018-04-12 5:45 ` Dave Chinner
2018-04-12 11:24 ` Jeff Layton
2018-04-12 21:11 ` Andres Freund
2018-04-12 10:19 ` Lukas Czerner
2018-04-12 19:46 ` Andres Freund
2018-04-12 2:17 ` Andres Freund
2018-04-12 3:02 ` Matthew Wilcox
2018-04-12 11:09 ` Jeff Layton
2018-04-12 11:19 ` Matthew Wilcox
2018-04-12 12:01 ` Dave Chinner
2018-04-12 15:08 ` Jeff Layton
2018-04-12 22:44 ` Dave Chinner
2018-04-13 13:18 ` Jeff Layton
2018-04-13 13:25 ` Andres Freund
2018-04-13 14:02 ` Matthew Wilcox
2018-04-14 1:47 ` Dave Chinner
2018-04-14 2:04 ` Andres Freund
2018-04-18 23:59 ` Dave Chinner
2018-04-19 0:23 ` Eric Sandeen
2018-04-14 2:38 ` Matthew Wilcox [this message]
2018-04-19 0:13 ` Dave Chinner
2018-04-19 0:40 ` Matthew Wilcox
2018-04-19 1:08 ` Theodore Y. Ts'o
2018-04-19 17:40 ` Matthew Wilcox
2018-04-19 23:27 ` Theodore Y. Ts'o
2018-04-19 23:28 ` Dave Chinner
2018-04-12 15:16 ` Theodore Y. Ts'o
2018-04-12 20:13 ` Andres Freund
2018-04-12 20:28 ` Matthew Wilcox
2018-04-12 21:14 ` Jeff Layton
2018-04-12 21:31 ` Matthew Wilcox
2018-04-13 12:56 ` Jeff Layton
2018-04-12 21:21 ` Theodore Y. Ts'o
2018-04-12 21:24 ` Matthew Wilcox
2018-04-12 21:37 ` Andres Freund
2018-04-12 20:24 ` Andres Freund
2018-04-12 21:27 ` Jeff Layton
2018-04-12 21:53 ` Andres Freund
2018-04-12 21:57 ` Theodore Y. Ts'o
2018-04-21 18:14 ` Jan Kara
2018-04-12 5:34 ` Theodore Y. Ts'o
2018-04-12 19:55 ` Andres Freund
2018-04-12 21:52 ` Theodore Y. Ts'o
2018-04-12 22:03 ` Andres Freund
2018-04-18 18:09 ` J. Bruce Fields
2018-04-13 14:48 ` Matthew Wilcox
2018-04-21 16:59 ` Jan Kara
[not found] <8da874c9-cf9c-d40a-3474-b773190878e7@commandprompt.com>
[not found] ` <20180410184356.GD3563@thunk.org>
2018-04-10 19:47 ` Martin Steigerwald
2018-04-18 16:52 ` J. Bruce Fields
2018-04-19 8:39 ` Christoph Hellwig
2018-04-19 14:10 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180414023814.GB997@bombadil.infradead.org \
--to=willy@infradead.org \
--cc=adilger@dilger.ca \
--cc=andres@anarazel.de \
--cc=david@fromorbit.com \
--cc=jd@commandprompt.com \
--cc=jlayton@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linuxfoundation.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).