From: Dave Chinner <david@fromorbit.com>
To: Matthew Wilcox <matthew@wil.cx>
Cc: Theodore Ts'o <tytso@mit.edu>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v3 0/3] Add XIP support to ext4
Date: Mon, 23 Dec 2013 14:16:16 +1100 [thread overview]
Message-ID: <20131223031616.GE3220@dastard> (raw)
In-Reply-To: <20131220181731.GG19166@parisc-linux.org>
On Fri, Dec 20, 2013 at 11:17:31AM -0700, Matthew Wilcox wrote:
> On Thu, Dec 19, 2013 at 12:18:48PM -0500, Theodore Ts'o wrote:
> > On Thu, Dec 19, 2013 at 10:12:02AM -0700, Matthew Wilcox wrote:
> > >
> > > ... I think it'll actually be ext4_get_block_fault, not _write, and it
> > > will include code to zero the returned blocks if they're uninitialised.
> >
> > I assume what you mean here is if we see that the blocks are
> > uninitialized, we don't need to read from the persistent memory at
> > all; we can just map in a zeroed page, hopefully one from our stock of
> > pre-zeroed pages. Yes?
>
> Maybe. We have a tension here between wanting to avoid unnecessary
> writes to the media (as you say, wear is going to be important for some
> media, if not all) and wanting to not fragment files (both for extent
> tree compactness and so that we can use PMD or even PGD mappings if the
> stars align). It'll be up to the filesystem whether it chooses to satisfy
> the get_block request with something prezeroed, or something that aligns
> nicely. Ideally, it'll be able to find a block of storage that does both!
Yes, XFS with it's per-file extent size hints does that already.
Like I've said previously - this is a solved problem and all we need
is the IO completion callbacks to make sub-extent size IOs convert
the allocated unwritten regions appropriately...
> Actually, I now see a second way to read what you wrote. If you meant
> "we can map in ZERO_PAGE or one of its analogs", then no. The amount
> of cruft that optimisation added to the filemap_xip code is horrendous.
> I don't think it's a particularly common workload (mmap a holey file,
> read lots of zeroes out of it without ever writing to it), so I think
> it's far better to allocate a page of storage and zero it.
Happens far more often than you think in scientific calculations.
Sparse matrices are extremely common, and it's a valid optimistion
to walk then with mmap and have all the uninitialised vectors simply
return zero without having storage space allocated. In this sort of
situation, you really don't want to be allocating and zeroing
persistent memory just because a terabyte sized sparse identity
matrix was mmapped and read in it's entirity during a calculation....
Persistent memory needs to handle sparse files efficiently. I'd
suggest that we already have very well tested mechanism to do
that: the mapping tree on each inode. use the radix tree to index
the space, mapping either a zero page into each hole index that is
mapped read only, and replace it with an allocated, zeroed mapping
at page_mkwrite() time. i.e. use the mapping radix tree to point at
all the pages we've mapped from the backing device rather than just
mapping an anonymous memory address from the backing device
into userspace.
That also opens the door for easily retrofitting buffered writes
into persistent memory if we need them (e.g. mmap() of encrypted
persistent memory).
Once again, we don't need to re-invent the wheel because we already
have one designed specifically for this purpose....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2013-12-23 3:16 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-17 19:18 [PATCH v3 0/3] Add XIP support to ext4 Matthew Wilcox
2013-12-17 19:18 ` [PATCH v3 1/3] Fix XIP fault vs truncate race Matthew Wilcox
2013-12-17 19:18 ` [PATCH v3 2/3] xip: Add xip_zero_page_range Matthew Wilcox
2013-12-17 19:18 ` [PATCH v3 3/3] ext4: Add XIP functionality Matthew Wilcox
2013-12-17 22:30 ` [PATCH v3 0/3] Add XIP support to ext4 Dave Chinner
2013-12-18 2:31 ` Matthew Wilcox
2013-12-18 5:01 ` Theodore Ts'o
2013-12-18 14:27 ` Matthew Wilcox
2013-12-19 2:07 ` Theodore Ts'o
2013-12-19 4:12 ` Matthew Wilcox
2013-12-19 4:37 ` Dave Chinner
2013-12-19 5:43 ` Theodore Ts'o
2013-12-19 15:20 ` Matthew Wilcox
2013-12-19 16:17 ` Theodore Ts'o
2013-12-19 17:12 ` Matthew Wilcox
2013-12-19 17:18 ` Theodore Ts'o
2013-12-20 18:17 ` Matthew Wilcox
2013-12-20 19:34 ` Theodore Ts'o
2013-12-20 20:11 ` Matthew Wilcox
2013-12-23 3:36 ` Dave Chinner
2013-12-23 3:45 ` Matthew Wilcox
2013-12-23 4:32 ` Dave Chinner
2013-12-23 6:56 ` Dave Chinner
2013-12-23 14:51 ` Theodore Ts'o
2013-12-23 3:16 ` Dave Chinner [this message]
2013-12-24 16:27 ` Matthew Wilcox
2013-12-18 12:33 ` Dave Chinner
2013-12-18 15:22 ` Matthew Wilcox
2013-12-19 0:48 ` Dave Chinner
2013-12-19 1:05 ` Matthew Wilcox
2013-12-19 1:58 ` Dave Chinner
2013-12-19 15:32 ` Matthew Wilcox
2013-12-19 23:46 ` Dave Chinner
2013-12-20 16:45 ` Matthew Wilcox
2013-12-23 4:14 ` Dave Chinner
2013-12-18 18:13 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131223031616.GE3220@dastard \
--to=david@fromorbit.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=matthew.r.wilcox@intel.com \
--cc=matthew@wil.cx \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).