From: Martin Steigerwald <martin@lichtvoll.de>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: "Joshua D. Drake" <jd@commandprompt.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: fsync() errors is unsafe and risks data loss
Date: Tue, 10 Apr 2018 21:47:21 +0200 [thread overview]
Message-ID: <14942494.44S1RI7MjI@merkaba> (raw)
In-Reply-To: <20180410184356.GD3563@thunk.org>
Hi Theodore, Darrick, Joshua.
CC´d fsdevel as it does not appear to be Ext4 specific to me (and to you as
well, Theodore).
Theodore Y. Ts'o - 10.04.18, 20:43:
> This isn't actually an ext4 issue, but a long-standing VFS/MM issue.
[…]
> First of all, what storage devices will do when they hit an exception
> condition is quite non-deterministic. For example, the vast majority
> of SSD's are not power fail certified. What this means is that if
> they suffer a power drop while they are doing a GC, it is quite
> possible for data written six months ago to be lost as a result. The
> LBA could potentialy be far, far away from any LBA's that were
> recently written, and there could have been multiple CACHE FLUSH
> operations in the since the LBA in question was last written six
> months ago. No matter; for a consumer-grade SSD, it's possible for
> that LBA to be trashed after an unexpected power drop.
Guh. I was not aware of this. I knew consumer-grade SSDs often do not have
power loss protection, but still thought they´d handle garble collection in an
atomic way. Sometimes I am tempted to sing an "all hardware is crap" song
(starting with Meltdown/Spectre, then probably heading over to storage devices
and so on… including firmware crap like Intel ME).
> Next, the reason why fsync() has the behaviour that it does is one
> ofhe the most common cases of I/O storage errors in buffered use
> cases, certainly as seen by the community distros, is the user who
> pulls out USB stick while it is in use. In that case, if there are
> dirtied pages in the page cache, the question is what can you do?
> Sooner or later the writes will time out, and if you leave the pages
> dirty, then it effectively becomes a permanent memory leak. You can't
> unmount the file system --- that requires writing out all of the pages
> such that the dirty bit is turned off. And if you don't clear the
> dirty bit on an I/O error, then they can never be cleaned. You can't
> even re-insert the USB stick; the re-inserted USB stick will get a new
> block device. Worse, when the USB stick was pulled, it will have
> suffered a power drop, and see above about what could happen after a
> power drop for non-power fail certified flash devices --- it goes
> double for the cheap sh*t USB sticks found in the checkout aisle of
> Micro Center.
>From the original PostgreSQL mailing list thread I did not get on how exactly
FreeBSD differs in behavior, compared to Linux. I am aware of one operating
system that from a user point of view handles this in almost the right way
IMHO: AmigaOS.
When you removed a floppy disk from the drive while the OS was writing to it
it showed a "You MUST insert volume somename into drive somedrive:" and if
you did, it just continued writing. (The part that did not work well was that
with the original filesystem if you did not insert it back, the whole disk was
corrupted, usually to the point beyond repair, so the "MUST" was no joke.)
In my opinion from a user´s point of view this is the only sane way to handle
the premature removal of removable media. I have read of a GSoC project to
implement something like this for NetBSD but I did not check on the outcome of
it. But in MS-DOS I think there has been something similar, however MS-DOS is
not an multitasking operating system as AmigaOS is.
Implementing something like this for Linux would be quite a feat, I think,
cause in addition to the implementation in the kernel, the desktop environment
or whatever other userspace you use would need to handle it as well, so you´d
have to adapt udev / udisks / probably Systemd. And probably this behavior
needs to be restricted to anything that is really removable and even then in
order to prevent memory exhaustion in case processes continue to write to an
removed and not yet re-inserted USB harddisk the kernel would need to halt I/O
processes which dirty I/O to this device. (I believe this is what AmigaOS did.
It just blocked all subsequent I/O to the device still it was re-inserted. But
then the I/O handling in that OS at that time is quite different from what
Linux does.)
> So this is the explanation for why Linux handles I/O errors by
> clearing the dirty bit after reporting the error up to user space.
> And why there is not eagerness to solve the problem simply by "don't
> clear the dirty bit". For every one Postgres installation that might
> have a better recover after an I/O error, there's probably a thousand
> clueless Fedora and Ubuntu users who will have a much worse user
> experience after a USB stick pull happens.
I was not aware that flash based media may be as crappy as you hint at.
>From my tests with AmigaOS 4.something or AmigaOS 3.9 + 3rd Party Poseidon USB
stack the above mechanism worked even with USB sticks. I however did not test
this often and I did not check for data corruption after a test.
Thanks,
--
Martin
next parent reply other threads:[~2018-04-10 19:47 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <8da874c9-cf9c-d40a-3474-b773190878e7@commandprompt.com>
[not found] ` <20180410184356.GD3563@thunk.org>
2018-04-10 19:47 ` Martin Steigerwald [this message]
2018-04-18 16:52 ` fsync() errors is unsafe and risks data loss J. Bruce Fields
2018-04-19 8:39 ` Christoph Hellwig
2018-04-19 14:10 ` J. Bruce Fields
2018-04-10 22:07 Andres Freund
2018-04-11 21:52 ` Andreas Dilger
2018-04-12 0:09 ` Dave Chinner
2018-04-12 2:32 ` Andres Freund
2018-04-12 2:51 ` Andres Freund
2018-04-12 5:09 ` Theodore Y. Ts'o
2018-04-12 5:45 ` Dave Chinner
2018-04-12 11:24 ` Jeff Layton
2018-04-12 21:11 ` Andres Freund
2018-04-12 10:19 ` Lukas Czerner
2018-04-12 19:46 ` Andres Freund
2018-04-12 2:17 ` Andres Freund
2018-04-12 3:02 ` Matthew Wilcox
2018-04-12 11:09 ` Jeff Layton
2018-04-12 11:19 ` Matthew Wilcox
2018-04-12 12:01 ` Dave Chinner
2018-04-12 15:08 ` Jeff Layton
2018-04-12 22:44 ` Dave Chinner
2018-04-13 13:18 ` Jeff Layton
2018-04-13 13:25 ` Andres Freund
2018-04-13 14:02 ` Matthew Wilcox
2018-04-14 1:47 ` Dave Chinner
2018-04-14 2:04 ` Andres Freund
2018-04-18 23:59 ` Dave Chinner
2018-04-19 0:23 ` Eric Sandeen
2018-04-14 2:38 ` Matthew Wilcox
2018-04-19 0:13 ` Dave Chinner
2018-04-19 0:40 ` Matthew Wilcox
2018-04-19 1:08 ` Theodore Y. Ts'o
2018-04-19 17:40 ` Matthew Wilcox
2018-04-19 23:27 ` Theodore Y. Ts'o
2018-04-19 23:28 ` Dave Chinner
2018-04-12 15:16 ` Theodore Y. Ts'o
2018-04-12 20:13 ` Andres Freund
2018-04-12 20:28 ` Matthew Wilcox
2018-04-12 21:14 ` Jeff Layton
2018-04-12 21:31 ` Matthew Wilcox
2018-04-13 12:56 ` Jeff Layton
2018-04-12 21:21 ` Theodore Y. Ts'o
2018-04-12 21:24 ` Matthew Wilcox
2018-04-12 21:37 ` Andres Freund
2018-04-12 20:24 ` Andres Freund
2018-04-12 21:27 ` Jeff Layton
2018-04-12 21:53 ` Andres Freund
2018-04-12 21:57 ` Theodore Y. Ts'o
2018-04-21 18:14 ` Jan Kara
2018-04-12 5:34 ` Theodore Y. Ts'o
2018-04-12 19:55 ` Andres Freund
2018-04-12 21:52 ` Theodore Y. Ts'o
2018-04-12 22:03 ` Andres Freund
2018-04-18 18:09 ` J. Bruce Fields
2018-04-13 14:48 ` Matthew Wilcox
2018-04-21 16:59 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=14942494.44S1RI7MjI@merkaba \
--to=martin@lichtvoll.de \
--cc=jd@commandprompt.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.