From: Matthew Wilcox <willy@infradead.org>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Sitsofe Wheeler <sitsofe@gmail.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
drh@sqlite.org, Jan Kara <jack@suse.cz>,
Dave Chinner <david@fromorbit.com>, Theodore Tso <tytso@mit.edu>,
harshad shirwadkar <harshadshirwadkar@gmail.com>
Subject: Re: Questions about filesystems from SQLite author presentation
Date: Mon, 6 Jan 2020 08:42:57 -0800 [thread overview]
Message-ID: <20200106164257.GJ6788@bombadil.infradead.org> (raw)
In-Reply-To: <CAOQ4uxhJhzUj_sjhDknGzdLs6kOXzt3GO2vyCzmuBNTSsAQLGA@mail.gmail.com>
On Mon, Jan 06, 2020 at 05:40:20PM +0200, Amir Goldstein wrote:
> On Mon, Jan 6, 2020 at 9:26 AM Sitsofe Wheeler <sitsofe@gmail.com> wrote:
> > If a write occurs on one or two bytes of a file at about the same time as a power
> > loss, are other bytes of the file guaranteed to be unchanged after reboot?
> > Or might some other bytes within the same sector have been modified as well?
>
> I don't see how other bytes could change in this scenario, but I don't
> know if the
> hardware provides this guarantee. Maybe someone else knows the answer.
The question is nonsense because there is no way to write less than one
sector to a hardware device, by definition. So, treating this question
as being a read-modify-write of a single sector (assuming the "two bytes"
don't cross a sector boundary):
Hardware vendors are reluctant to provide this guarantee, but it's
essential to constructing a reliable storage system. We wrote the NVMe
spec in such a way that vendors must provide single-sector-atomicity
guarantees, and I hope they haven't managed to wiggle some nonsense
into the spec that allows them to not make that guarantee. The below
is a quote from the 1.4 spec. For those not versed in NVMe spec-ese,
"0's based value" means that putting a zero in this field means the
value of AWUPF is 1.
Atomic Write Unit Power Fail (AWUPF): This field indicates the size of
the write operation guaranteed to be written atomically to the NVM across
all namespaces with any supported namespace format during a power fail
or error condition.
If a specific namespace guarantees a larger size than is reported in
this field, then this namespace specific size is reported in the NAWUPF
field in the Identify Namespace data structure. Refer to section 6.4.
This field is specified in logical blocks and is a 0’s based value. The
AWUPF value shall be less than or equal to the AWUN value.
If a write command is submitted with size less than or equal to the
AWUPF value, the host is guaranteed that the write is atomic to the
NVM with respect to other read or write commands. If a write command
is submitted that is greater than this size, there is no guarantee of
command atomicity. If the write size is less than or equal to the AWUPF
value and the write command fails, then subsequent read commands for the
associated logical blocks shall return data from the previous successful
write command. If a write command is submitted with size greater than
the AWUPF value, then there is no guarantee of data returned on
subsequent reads of the associated logical blocks.
I take neither blame nor credit for what other storage standards may
implement; this is the only one I had a hand in, and I had to fight
hard to get it.
next prev parent reply other threads:[~2020-01-06 16:42 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-06 7:24 Questions about filesystems from SQLite author presentation Sitsofe Wheeler
2020-01-06 10:15 ` Dave Chinner
2020-01-07 8:40 ` Sitsofe Wheeler
2020-01-07 8:55 ` Jan Kara
2020-01-07 17:18 ` Darrick J. Wong
2020-01-07 8:47 ` Jan Kara
2020-01-06 15:40 ` Amir Goldstein
2020-01-06 16:42 ` Matthew Wilcox [this message]
2020-01-07 9:28 ` Sitsofe Wheeler
2020-01-06 18:31 ` Amir Goldstein
2020-01-07 9:16 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200106164257.GJ6788@bombadil.infradead.org \
--to=willy@infradead.org \
--cc=amir73il@gmail.com \
--cc=david@fromorbit.com \
--cc=drh@sqlite.org \
--cc=harshadshirwadkar@gmail.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=sitsofe@gmail.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).