linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-kernel@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-nvdimm@ml01.01.org, Matthew Wilcox <mawilcox@microsoft.com>,
	Dave Chinner <david@fromorbit.com>,
	linux-mm@kvack.org, Andreas Dilger <adilger.kernel@dilger.ca>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jan Kara <jack@suse.com>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH v2 2/9] ext2: tell DAX the size of allocation holes
Date: Mon, 29 Aug 2016 00:41:16 -0700	[thread overview]
Message-ID: <20160829074116.GA16491@infradead.org> (raw)
In-Reply-To: <20160826212934.GA11265@linux.intel.com>

On Fri, Aug 26, 2016 at 03:29:34PM -0600, Ross Zwisler wrote:
> These changes don't remove the things in XFS needed by the old I/O and fault
> paths (e.g.  xfs_get_blocks_direct() is still there an unchanged).  Is the
> correct way forward to get buy-in from ext2/ext4 so that they also move to
> supporting an iomap based I/O path (xfs_file_iomap_begin(),
> xfs_iomap_write_direct(), etc?).  That would allow us to have parallel I/O and
> fault paths for a while, then remove the old buffer_head based versions when
> the three supported filesystems have moved to iomap.
> 
> If ext2 and ext4 don't choose to move to iomap, though, I don't think we want
> to have a separate I/O & fault path for iomap/XFS.  That seems too painful,
> and the old buffer_head version should continue to work, ugly as it may be.

We're going to move forward killing buffer_heads in XFS.  I think ext4
would dramatically benefit from this a well, as would ext2 (although I
think all that DAX work in ext2 is a horrible idea to start with).

If I don't get buy-in for the iomap DAX work in the dax code we'll just
have to keep it separate.  That buffer_head mess just isn't maintainable
the long run.

> 1) In your mail above you say "It also gets rid of the other warts of the DAX
>    path due to pretending to be like direct I/O".  I assume by this you mean
>    the code in dax_do_io() around DIO_LOCKING, inode_dio_begin(), etc?

Yes.

>    Perhaps there are other things as well in XFS, but this is what I see in
>    the DAX code.  If so, yep, this seems like a win.  I don't understand how
>    DIO_LOCKING is relevant to the DAX I/O path, as we never mix buffered and
>    direct access.

It's related to doing stupid copy and paste from direct I/O in the DAX
code.

>    The comment in dax_do_io() for the inode_dio_begin() call says that it
>    prevents the I/O from races with truncate.  Am I correct that we now get
>    this protection via the xfs_rw_ilock()/xfs_rw_iunlock() calls in
>    xfs_file_dax_write()?

Yes, XFS always has a lock over reads that serializes with truncate.
Currenrly it's the XFS i_iolock, but I'll remove that soon and use the
VFS i_rwsem instead.  For ext2/4 we could go straight to i_rwsem in
shared mode.

> 2) Just a nit, I noticed that you used "~(PAGE_SIZE - 1)" in several places in
>    iomap_dax_actor() and iomap_dax_fault() instead of PAGE_MASK.  Was this
>    intentional?

Mostly because that's how I think.  I'm fine using PAGE_MASK, though.

> 3) It's kind of weird having iomap_dax_fault() in fs/dax.c but having
>    iomap_dax_actor() and iomap_dax_rw() in fs/iomap.c?  I'm guessing the
>    latter is placed where it is because it uses iomap_apply(), which is local
>    to fs/iomap.c?  Anyway, it would be nice if we could keep them together, if
>    possible.

It's still work in progress and could use a few cleanups.

> 
> 4) In iomap_dax_actor() you do this check:
> 
> 	WARN_ON_ONCE(iomap->type != IOMAP_MAPPED);
> 
>    If we hit this we should bail with -EIO, yea?  Otherwise we could write to
>    unmapped space or something horrible.

Fine with me.

> 5) In iomap_dax_fault, I think the "I/O beyond the end of the file" check
>    might have been broken.  Take for example an I/O to the second page of a
>    file, where the file has size one page.  So:

sure, I can fix this up.

> 6) Regarding the "we don't even have the size hole problem" comment in your
>    mail, the current PMD logic requires us to know the size of the hole.

And a big part of the iomap interface is proper reporting of holes.

  parent reply	other threads:[~2016-08-29  8:00 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-23 22:04 [PATCH v2 0/9] re-enable DAX PMD support Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 1/9] ext4: allow DAX writeback for hole punch Ross Zwisler
2016-09-21 15:22   ` Ross Zwisler
2016-09-22  6:59     ` Jan Kara
2016-09-22 15:51     ` Theodore Ts'o
2016-08-23 22:04 ` [PATCH v2 2/9] ext2: tell DAX the size of allocation holes Ross Zwisler
2016-08-25  7:57   ` Christoph Hellwig
2016-08-25 19:25     ` Ross Zwisler
2016-08-26 21:29     ` Ross Zwisler
2016-08-29  0:42       ` Dave Chinner
2016-08-29  7:41       ` Christoph Hellwig [this message]
2016-08-29 12:57         ` Theodore Ts'o
2016-08-30  7:21           ` Christoph Hellwig
2016-09-09 16:48           ` Ross Zwisler
2016-09-09 20:35             ` Matthew Wilcox
2016-09-09 22:34               ` Dan Williams
2016-09-10  7:31                 ` Christoph Hellwig
2016-09-10  7:50                   ` Matthew Wilcox
2016-09-10 17:49                   ` Theodore Ts'o
2016-09-11  0:42                     ` Matthew Wilcox
2016-09-10  8:15                 ` Matthew Wilcox
2016-09-10 14:56                   ` Dan Williams
2016-09-10  7:30               ` Christoph Hellwig
2016-09-10  7:33                 ` Matthew Wilcox
2016-09-10  7:42                   ` Christoph Hellwig
2016-09-10  7:52                     ` Matthew Wilcox
2016-09-11 12:47                       ` Christoph Hellwig
2016-09-11 22:57                         ` Ross Zwisler
2016-09-10 15:55                 ` Matthew Wilcox
2016-09-15 20:09   ` Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 3/9] ext4: " Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 4/9] dax: remove buffer_size_valid() Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 5/9] dax: make 'wait_table' global variable static Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 6/9] dax: consistent variable naming for DAX entries Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 7/9] dax: coordinate locking for offsets in PMD range Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 8/9] dax: re-enable DAX PMD support Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 9/9] dax: remove "depends on BROKEN" from FS_DAX_PMD Ross Zwisler
2016-08-30 23:01 ` [PATCH v2 0/9] re-enable DAX PMD support Ross Zwisler
2016-08-31 20:20   ` Kani, Toshimitsu
2016-08-31 21:36     ` Ross Zwisler
2016-08-31 22:08       ` Kani, Toshimitsu
2016-09-01 16:21         ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160829074116.GA16491@infradead.org \
    --to=hch@infradead.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=jack@suse.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=mawilcox@microsoft.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).