linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <matthew@wil.cx>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v3 0/3] Add XIP support to ext4
Date: Fri, 20 Dec 2013 13:11:00 -0700	[thread overview]
Message-ID: <20131220201059.GH19166@parisc-linux.org> (raw)
In-Reply-To: <20131220193455.GA6912@thunk.org>

On Fri, Dec 20, 2013 at 02:34:55PM -0500, Theodore Ts'o wrote:
> On Fri, Dec 20, 2013 at 11:17:31AM -0700, Matthew Wilcox wrote:
> > Maybe.  We have a tension here between wanting to avoid unnecessary
> > writes to the media (as you say, wear is going to be important for some
> > media, if not all) and wanting to not fragment files (both for extent
> > tree compactness and so that we can use PMD or even PGD mappings if the
> > stars align).  It'll be up to the filesystem whether it chooses to satisfy
> > the get_block request with something prezeroed, or something that aligns
> > nicely.  Ideally, it'll be able to find a block of storage that does both!
> > 
> > Actually, I now see a second way to read what you wrote.  If you meant
> > "we can map in ZERO_PAGE or one of its analogs", then no.  The amount
> > of cruft that optimisation added to the filemap_xip code is horrendous.
> > I don't think it's a particularly common workload (mmap a holey file,
> > read lots of zeroes out of it without ever writing to it), so I think
> > it's far better to allocate a page of storage and zero it.
> 
> It seems that you're primarily focused about allocated versus
> unallocated blocks, and what I think Dave and I are trying to point
> out is the distinction between initialized and uninitialized blocks
> (which are already allocated).

I understand the difference for filesystems on block devices; filesystem
blocks can be:

 * allocated, initialised (eg: after they've been written to)
 * allocated, uninitialised (eg: after fallocate)
 * unallocated, initialised (eg: written to in the page cache)
 * unallocated, uninitialised

A filesystem on top of an XIP device can't have an unallocated,
initialised block.  There's no page cache to buffer the write in, so at
the point where you're going to store to it, you have to allocate.

You also can't mmap() an allocated, uninitialised block.  It's got to
be initialised before we can insert a PTE that points to it.

> So I was thinking about the case where the blocks were already
> allocated and mapped --- so we have a logical -> physical block
> mapping already established.  However, if the blocks were allocated
> via fallocate(2), so they are unallocated, although they will be
> well-aligned.
> 
> Which means that if you pre-zero at read time, at that point you will
> be fragmenting the extent tree, and the blocks are already
> well-aligned so it's in fact better to fault in a zero page at read
> time when we are dealing with an allocated, but not-yet-initialized
> block.

Just to check here, you mean a ZERO_PAGE, right?  Or a page cache page
that has been zeroed?

> Also, one of the ways which we handle fragmentation is via delayed
> allocation.  That is, we don't make the allocation decision until the
> last possible second.  We do lose this optimization for direct I/O,
> since that's part of the nature of the beast --- but there's no reason
> not to have it for XIP writes --- especially if the goal is to be able
> to support persistent memory storage devices in a first class way,
> instead of a one-off hack for demonstration purposes....

I think it's also the nature of the beast that you lose this optimisation
for XIP writes.  Sure, there are multiple ways we could do _not exactly_
XIP to take advantage of persistent memory, and I think we should discuss
them, but we should leave XIP as meaning XIP.  I have some great ideas
about how a hybrid approach could work ...

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

  reply	other threads:[~2013-12-20 20:11 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-17 19:18 [PATCH v3 0/3] Add XIP support to ext4 Matthew Wilcox
2013-12-17 19:18 ` [PATCH v3 1/3] Fix XIP fault vs truncate race Matthew Wilcox
2013-12-17 19:18 ` [PATCH v3 2/3] xip: Add xip_zero_page_range Matthew Wilcox
2013-12-17 19:18 ` [PATCH v3 3/3] ext4: Add XIP functionality Matthew Wilcox
2013-12-17 22:30 ` [PATCH v3 0/3] Add XIP support to ext4 Dave Chinner
2013-12-18  2:31   ` Matthew Wilcox
2013-12-18  5:01     ` Theodore Ts'o
2013-12-18 14:27       ` Matthew Wilcox
2013-12-19  2:07         ` Theodore Ts'o
2013-12-19  4:12           ` Matthew Wilcox
2013-12-19  4:37             ` Dave Chinner
2013-12-19  5:43             ` Theodore Ts'o
2013-12-19 15:20               ` Matthew Wilcox
2013-12-19 16:17                 ` Theodore Ts'o
2013-12-19 17:12                   ` Matthew Wilcox
2013-12-19 17:18                     ` Theodore Ts'o
2013-12-20 18:17                       ` Matthew Wilcox
2013-12-20 19:34                         ` Theodore Ts'o
2013-12-20 20:11                           ` Matthew Wilcox [this message]
2013-12-23  3:36                             ` Dave Chinner
2013-12-23  3:45                               ` Matthew Wilcox
2013-12-23  4:32                                 ` Dave Chinner
2013-12-23  6:56                                 ` Dave Chinner
2013-12-23 14:51                                   ` Theodore Ts'o
2013-12-23  3:16                         ` Dave Chinner
2013-12-24 16:27                           ` Matthew Wilcox
2013-12-18 12:33     ` Dave Chinner
2013-12-18 15:22       ` Matthew Wilcox
2013-12-19  0:48         ` Dave Chinner
2013-12-19  1:05           ` Matthew Wilcox
2013-12-19  1:58             ` Dave Chinner
2013-12-19 15:32               ` Matthew Wilcox
2013-12-19 23:46                 ` Dave Chinner
2013-12-20 16:45                   ` Matthew Wilcox
2013-12-23  4:14                     ` Dave Chinner
2013-12-18 18:13   ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131220201059.GH19166@parisc-linux.org \
    --to=matthew@wil.cx \
    --cc=david@fromorbit.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).