From: Theodore Ts'o <tytso@mit.edu>
To: Matthew Wilcox <matthew@wil.cx>
Cc: Dave Chinner <david@fromorbit.com>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v3 0/3] Add XIP support to ext4
Date: Wed, 18 Dec 2013 00:01:27 -0500 [thread overview]
Message-ID: <20131218050127.GA15289@thunk.org> (raw)
In-Reply-To: <20131218023143.GA24491@parisc-linux.org>
On Tue, Dec 17, 2013 at 07:31:43PM -0700, Matthew Wilcox wrote:
> On Wed, Dec 18, 2013 at 09:30:50AM +1100, Dave Chinner wrote:
> > No, you haven't addressed the problem. There is nothing in this
> > patch set that converts an unwritten extent after it is written to.
> > Hence on every subsequent read will return zeros because the block
> > is still marked as unwritten.
>
> I don't understand. Here's the path as I understand it:
>
> xip_file_write -> __xip_file_write -> ext4_get_xip_mem(create=0),
> returns -ENODATA. So we call ext4_get_xip_mem again, this time with
> create=1 which causes ext4_get_block() to allocate blocks.
When Dave says that the extent is unwritten, what he means is that the
block as been allocated, but it is marked as being uninitialized.
Since the block is uninitialized we must not read from that block;
instead, if the user issues a read request to an uninitialized block,
we must return all zero's for that block (lest we reveal stale data).
And if we try to write to an uninitialized block, *after* we write to
the data block, we have to clear the uninitalized block, which in some
cases might mean splitting the extent --- if we have an extent which
maps logical blocks 0 to 5 to physical blocks 100 to 105, and we write
to block #2, will need to change that single uninitialized extent to
three extents --- one covering blocks logical blocks 0-1, one covering
logical block 2, and one covering logical blocks 3-5, where the first
and third would be marked uninitialized, and the second would be
marked initialized. Since we potentially need to convert one extent
to three extents, this might involve an extent tree node split.
You keep talking about allocated vs unallocated, and create=0 and
create=1, but even for an allocated block, that block may be marked
initialized or uninitialized --- and if it is marked uninitialized,
xip_file_write must call a file system-specific callback to allow this
conversion to take place.
Now for persistent memory, if writes are infinitely fast and free
(i.e., there are no significant write cycle limitations), there is a
terrible, terrible hack we could do, which would involve fallocate()
never creating uninitialized extents, but instead writing zero's to
all of the allocated blocks. And whenever we open a file, before we
return a file descriptor, we could scan all of the extent blocks
looking for uninitialized extents, and if so, convert them to
initialized extents by writing zeros to all of the blocks at open
time.
This would be horribly slow for hard drives, but typically the devices
that use XIP are either read-only, or fairly fast. It's really ugly,
and if it turns out that persistent memory is not infinitely fast and
have infinite write cycles (and my understanding is that persistent
memory is either extraordinarily expensive, or not as perfect in terms
speed and wear resistenace as some of their boosters claim), it's not
clear that trying to evade dealing with uninitialized extents is a
smart way to go.
In other words, suppose somone calls fallocate on a 2GB region on an
XIP mounted file system. Would you be happy forcing 2GB's worth of
writes at fallocate time(), just because we don't want to deal with
adding a file system callback in xip_file_write()?
Regards,
- Ted
next prev parent reply other threads:[~2013-12-18 5:01 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-17 19:18 [PATCH v3 0/3] Add XIP support to ext4 Matthew Wilcox
2013-12-17 19:18 ` [PATCH v3 1/3] Fix XIP fault vs truncate race Matthew Wilcox
2013-12-17 19:18 ` [PATCH v3 2/3] xip: Add xip_zero_page_range Matthew Wilcox
2013-12-17 19:18 ` [PATCH v3 3/3] ext4: Add XIP functionality Matthew Wilcox
2013-12-17 22:30 ` [PATCH v3 0/3] Add XIP support to ext4 Dave Chinner
2013-12-18 2:31 ` Matthew Wilcox
2013-12-18 5:01 ` Theodore Ts'o [this message]
2013-12-18 14:27 ` Matthew Wilcox
2013-12-19 2:07 ` Theodore Ts'o
2013-12-19 4:12 ` Matthew Wilcox
2013-12-19 4:37 ` Dave Chinner
2013-12-19 5:43 ` Theodore Ts'o
2013-12-19 15:20 ` Matthew Wilcox
2013-12-19 16:17 ` Theodore Ts'o
2013-12-19 17:12 ` Matthew Wilcox
2013-12-19 17:18 ` Theodore Ts'o
2013-12-20 18:17 ` Matthew Wilcox
2013-12-20 19:34 ` Theodore Ts'o
2013-12-20 20:11 ` Matthew Wilcox
2013-12-23 3:36 ` Dave Chinner
2013-12-23 3:45 ` Matthew Wilcox
2013-12-23 4:32 ` Dave Chinner
2013-12-23 6:56 ` Dave Chinner
2013-12-23 14:51 ` Theodore Ts'o
2013-12-23 3:16 ` Dave Chinner
2013-12-24 16:27 ` Matthew Wilcox
2013-12-18 12:33 ` Dave Chinner
2013-12-18 15:22 ` Matthew Wilcox
2013-12-19 0:48 ` Dave Chinner
2013-12-19 1:05 ` Matthew Wilcox
2013-12-19 1:58 ` Dave Chinner
2013-12-19 15:32 ` Matthew Wilcox
2013-12-19 23:46 ` Dave Chinner
2013-12-20 16:45 ` Matthew Wilcox
2013-12-23 4:14 ` Dave Chinner
2013-12-18 18:13 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131218050127.GA15289@thunk.org \
--to=tytso@mit.edu \
--cc=david@fromorbit.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=matthew.r.wilcox@intel.com \
--cc=matthew@wil.cx \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).