All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Steigerwald <martin@lichtvoll.de>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: "Joshua D. Drake" <jd@commandprompt.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: fsync() errors is unsafe and risks data loss
Date: Tue, 10 Apr 2018 21:47:21 +0200	[thread overview]
Message-ID: <14942494.44S1RI7MjI@merkaba> (raw)
In-Reply-To: <20180410184356.GD3563@thunk.org>

Hi Theodore, Darrick, Joshua.

CC´d fsdevel as it does not appear to be Ext4 specific to me (and to you as 
well, Theodore).

Theodore Y. Ts'o - 10.04.18, 20:43:
> This isn't actually an ext4 issue, but a long-standing VFS/MM issue.
[…]
> First of all, what storage devices will do when they hit an exception
> condition is quite non-deterministic.  For example, the vast majority
> of SSD's are not power fail certified.  What this means is that if
> they suffer a power drop while they are doing a GC, it is quite
> possible for data written six months ago to be lost as a result.  The
> LBA could potentialy be far, far away from any LBA's that were
> recently written, and there could have been multiple CACHE FLUSH
> operations in the since the LBA in question was last written six
> months ago.  No matter; for a consumer-grade SSD, it's possible for
> that LBA to be trashed after an unexpected power drop.

Guh. I was not aware of this. I knew consumer-grade SSDs often do not have 
power loss protection, but still thought they´d handle garble collection in an 
atomic way. Sometimes I am tempted to sing an "all hardware is crap" song 
(starting with Meltdown/Spectre, then probably heading over to storage devices 
and so on… including firmware crap like Intel ME).

> Next, the reason why fsync() has the behaviour that it does is one
> ofhe the most common cases of I/O storage errors in buffered use
> cases, certainly as seen by the community distros, is the user who
> pulls out USB stick while it is in use.  In that case, if there are
> dirtied pages in the page cache, the question is what can you do?
> Sooner or later the writes will time out, and if you leave the pages
> dirty, then it effectively becomes a permanent memory leak.  You can't
> unmount the file system --- that requires writing out all of the pages
> such that the dirty bit is turned off.  And if you don't clear the
> dirty bit on an I/O error, then they can never be cleaned.  You can't
> even re-insert the USB stick; the re-inserted USB stick will get a new
> block device.  Worse, when the USB stick was pulled, it will have
> suffered a power drop, and see above about what could happen after a
> power drop for non-power fail certified flash devices --- it goes
> double for the cheap sh*t USB sticks found in the checkout aisle of
> Micro Center.

>From the original PostgreSQL mailing list thread I did not get on how exactly 
FreeBSD differs in behavior, compared to Linux. I am aware of one operating 
system that from a user point of view handles this in almost the right way 
IMHO: AmigaOS.

When you removed a floppy disk from the drive while the OS was writing to it 
it showed a  "You MUST insert volume somename into drive somedrive:" and if 
you did, it just continued writing. (The part that did not work well was that 
with the original filesystem if you did not insert it back, the whole disk was 
corrupted, usually to the point beyond repair, so the "MUST" was no joke.)

In my opinion from a user´s point of view this is the only sane way to handle 
the premature removal of removable media. I have read of a GSoC project to 
implement something like this for NetBSD but I did not check on the outcome of 
it. But in MS-DOS I think there has been something similar, however MS-DOS is 
not an multitasking operating system as AmigaOS is.

Implementing something like this for Linux would be quite a feat, I think, 
cause in addition to the implementation in the kernel, the desktop environment 
or whatever other userspace you use would need to handle it as well, so you´d 
have to adapt udev / udisks / probably Systemd. And probably this behavior 
needs to be restricted to anything that is really removable and even then in 
order to prevent memory exhaustion in case processes continue to write to an 
removed and not yet re-inserted USB harddisk the kernel would need to halt I/O 
processes which dirty I/O to this device. (I believe this is what AmigaOS did. 
It just blocked all subsequent I/O to the device still it was re-inserted. But 
then the I/O handling in that OS at that time is quite different from what 
Linux does.)

> So this is the explanation for why Linux handles I/O errors by
> clearing the dirty bit after reporting the error up to user space.
> And why there is not eagerness to solve the problem simply by "don't
> clear the dirty bit".  For every one Postgres installation that might
> have a better recover after an I/O error, there's probably a thousand
> clueless Fedora and Ubuntu users who will have a much worse user
> experience after a USB stick pull happens.

I was not aware that flash based media may be as crappy as you hint at.

>From my tests with AmigaOS 4.something or AmigaOS 3.9 + 3rd Party Poseidon USB 
stack the above mechanism worked even with USB sticks. I however did not test 
this often and I did not check for data corruption after a test.

Thanks,
-- 
Martin

       reply	other threads:[~2018-04-10 19:47 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8da874c9-cf9c-d40a-3474-b773190878e7@commandprompt.com>
     [not found] ` <20180410184356.GD3563@thunk.org>
2018-04-10 19:47   ` Martin Steigerwald [this message]
2018-04-18 16:52     ` fsync() errors is unsafe and risks data loss J. Bruce Fields
2018-04-19  8:39       ` Christoph Hellwig
2018-04-19 14:10         ` J. Bruce Fields
2018-04-10 22:07 Andres Freund
2018-04-11 21:52 ` Andreas Dilger
2018-04-12  0:09   ` Dave Chinner
2018-04-12  2:32     ` Andres Freund
2018-04-12  2:51       ` Andres Freund
2018-04-12  5:09       ` Theodore Y. Ts'o
2018-04-12  5:45       ` Dave Chinner
2018-04-12 11:24         ` Jeff Layton
2018-04-12 21:11           ` Andres Freund
2018-04-12 10:19       ` Lukas Czerner
2018-04-12 19:46         ` Andres Freund
2018-04-12  2:17   ` Andres Freund
2018-04-12  3:02     ` Matthew Wilcox
2018-04-12 11:09       ` Jeff Layton
2018-04-12 11:19         ` Matthew Wilcox
2018-04-12 12:01         ` Dave Chinner
2018-04-12 15:08           ` Jeff Layton
2018-04-12 22:44             ` Dave Chinner
2018-04-13 13:18               ` Jeff Layton
2018-04-13 13:25                 ` Andres Freund
2018-04-13 14:02                 ` Matthew Wilcox
2018-04-14  1:47                   ` Dave Chinner
2018-04-14  2:04                     ` Andres Freund
2018-04-18 23:59                       ` Dave Chinner
2018-04-19  0:23                         ` Eric Sandeen
2018-04-14  2:38                     ` Matthew Wilcox
2018-04-19  0:13                       ` Dave Chinner
2018-04-19  0:40                         ` Matthew Wilcox
2018-04-19  1:08                           ` Theodore Y. Ts'o
2018-04-19 17:40                             ` Matthew Wilcox
2018-04-19 23:27                               ` Theodore Y. Ts'o
2018-04-19 23:28                           ` Dave Chinner
2018-04-12 15:16           ` Theodore Y. Ts'o
2018-04-12 20:13             ` Andres Freund
2018-04-12 20:28               ` Matthew Wilcox
2018-04-12 21:14                 ` Jeff Layton
2018-04-12 21:31                   ` Matthew Wilcox
2018-04-13 12:56                     ` Jeff Layton
2018-04-12 21:21                 ` Theodore Y. Ts'o
2018-04-12 21:24                   ` Matthew Wilcox
2018-04-12 21:37                   ` Andres Freund
2018-04-12 20:24         ` Andres Freund
2018-04-12 21:27           ` Jeff Layton
2018-04-12 21:53             ` Andres Freund
2018-04-12 21:57               ` Theodore Y. Ts'o
2018-04-21 18:14         ` Jan Kara
2018-04-12  5:34     ` Theodore Y. Ts'o
2018-04-12 19:55       ` Andres Freund
2018-04-12 21:52         ` Theodore Y. Ts'o
2018-04-12 22:03           ` Andres Freund
2018-04-18 18:09     ` J. Bruce Fields
2018-04-13 14:48 ` Matthew Wilcox
2018-04-21 16:59   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14942494.44S1RI7MjI@merkaba \
    --to=martin@lichtvoll.de \
    --cc=jd@commandprompt.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.