From: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
To: "Darrick J. Wong" <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org,
Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>,
linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Subject: Re: [PATCH 10/12] xfs: use iomap to implement DAX
Date: Thu, 15 Sep 2016 08:43:20 +0200 [thread overview]
Message-ID: <20160915064320.GA7665@lst.de> (raw)
In-Reply-To: <20160915052933.GH9314-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
On Wed, Sep 14, 2016 at 10:29:33PM -0700, Darrick J. Wong wrote:
> I'm a little confused about xfs_file_iomap_begin() and IOMAP_WRITE --
> prior to this patchset, it was only called via page_mkwrite and what
> used to be write_begin, and all it did was create a delalloc reservation
> to back the write (or actually allocate blocks if extsize was set).
No, it was called from write itself in addition, of course.
> Thinking ahead to integrating reflink with DAX (and presumably
> directio) in both cases _file_iomap_begin has to return a real extent.
> If the range we're writing is shared, then that means that I'd have to
> ensure the range is reserved in the COW fork (the reflink patches already do
> this). Next, if the caller requires a firm mapping (i.e. dax or dio)
> I'd have to allocate the range in the COW fork and have that mapping
> returned to the caller.
Yes. No different than the existing buffered write case with an
extent size hint, really.
> So I guess it would look something like this:
>
> /* reserve reflink cow range like the reflink patches already do */
> if (flags & (IOMAP_WRITE | IOMAP_ZERO) && xfs_is_reflink_inode(ip))
> xfs_reflink_reserve_cow_range(ip, offset, length);
For zeroing we don't need to reserve anything IFF we are finding a hole,
so this is too early.
> /* do the cow thing if need be */
> if ((flags & IOMAP_WRITE) &&
> xfs_is_reflink_inode(ip) &&
> (IS_DAX(inode) || (flags & IOMAP_DIO)) {
> xfs_reflink_allocate_cow_range(ip, offset, length);
>
> if (xfs_reflink_is_cow_pending(ip, offset))
> xfs_reflink_find_cow_mapping(ip, offset, &imap, &need_alloc);
> else
> xfs_bmapi_read(offset, &imap)
> } else
> xfs_bmapi_read(offset, &imap) /* non-cow io */
We really have three write block allocation variants, of which the first
two look basically the same except for I/O completion handling:
1) For DAX we need to allocate real extents and zero them before use.
2) For buffered I/O with an extent size hint or direct I/O we need to
allocate unwritten extents (and convert after the I/O succeeded for
the range that succeeded)
3) for buffered I/O without and extent size hint we need to created
a delayed allocation
That logic exists both in the old __xfs_get_blocks and and the new
iomap code (although we don't check for the dio case yet).
So I don't think we should need to change anything in terms of COW
in relation to DAX or direct I/O here. I do however need to fully
audit the usage of the reflink calls for the cases 1 and 2. For one
it doesn't really seem nessecary to split the reserve/allocate case
ehre, and I hope that a lot of the bmap btree looks could be saved,
similar to my rewrite of the delalloc case.
> Does that look plausible? If _iomap_begin will have one type of caller
> that only needs a delalloc reservation and another type that actually
> needs a firm mapping, should we make a new IOMAP flag to indicate that
> the caller really needs a firm mapping? IS_DAX is fine for detecting
> the DAX case as this patchset shows, but what about directio?
>
> (At this point Dave suggests on IRC that we just make a dio specific
> iomap_begin. That eliminates the flag, though it'd be a fairly similar
> function.)
As explained above, DIO is exaxtly the same as buffered I/O with
an extent size hint. We already handle it fine. DAX is almost
the same, keyed off a DAX check in lower level code to convert unwritten
extents and zero the blocks.
WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@lst.de>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-nvdimm@ml01.01.org, ross.zwisler@linux.intel.com,
Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 10/12] xfs: use iomap to implement DAX
Date: Thu, 15 Sep 2016 08:43:20 +0200 [thread overview]
Message-ID: <20160915064320.GA7665@lst.de> (raw)
In-Reply-To: <20160915052933.GH9314@birch.djwong.org>
On Wed, Sep 14, 2016 at 10:29:33PM -0700, Darrick J. Wong wrote:
> I'm a little confused about xfs_file_iomap_begin() and IOMAP_WRITE --
> prior to this patchset, it was only called via page_mkwrite and what
> used to be write_begin, and all it did was create a delalloc reservation
> to back the write (or actually allocate blocks if extsize was set).
No, it was called from write itself in addition, of course.
> Thinking ahead to integrating reflink with DAX (and presumably
> directio) in both cases _file_iomap_begin has to return a real extent.
> If the range we're writing is shared, then that means that I'd have to
> ensure the range is reserved in the COW fork (the reflink patches already do
> this). Next, if the caller requires a firm mapping (i.e. dax or dio)
> I'd have to allocate the range in the COW fork and have that mapping
> returned to the caller.
Yes. No different than the existing buffered write case with an
extent size hint, really.
> So I guess it would look something like this:
>
> /* reserve reflink cow range like the reflink patches already do */
> if (flags & (IOMAP_WRITE | IOMAP_ZERO) && xfs_is_reflink_inode(ip))
> xfs_reflink_reserve_cow_range(ip, offset, length);
For zeroing we don't need to reserve anything IFF we are finding a hole,
so this is too early.
> /* do the cow thing if need be */
> if ((flags & IOMAP_WRITE) &&
> xfs_is_reflink_inode(ip) &&
> (IS_DAX(inode) || (flags & IOMAP_DIO)) {
> xfs_reflink_allocate_cow_range(ip, offset, length);
>
> if (xfs_reflink_is_cow_pending(ip, offset))
> xfs_reflink_find_cow_mapping(ip, offset, &imap, &need_alloc);
> else
> xfs_bmapi_read(offset, &imap)
> } else
> xfs_bmapi_read(offset, &imap) /* non-cow io */
We really have three write block allocation variants, of which the first
two look basically the same except for I/O completion handling:
1) For DAX we need to allocate real extents and zero them before use.
2) For buffered I/O with an extent size hint or direct I/O we need to
allocate unwritten extents (and convert after the I/O succeeded for
the range that succeeded)
3) for buffered I/O without and extent size hint we need to created
a delayed allocation
That logic exists both in the old __xfs_get_blocks and and the new
iomap code (although we don't check for the dio case yet).
So I don't think we should need to change anything in terms of COW
in relation to DAX or direct I/O here. I do however need to fully
audit the usage of the reflink calls for the cases 1 and 2. For one
it doesn't really seem nessecary to split the reserve/allocate case
ehre, and I hope that a lot of the bmap btree looks could be saved,
similar to my rewrite of the delalloc case.
> Does that look plausible? If _iomap_begin will have one type of caller
> that only needs a delalloc reservation and another type that actually
> needs a firm mapping, should we make a new IOMAP flag to indicate that
> the caller really needs a firm mapping? IS_DAX is fine for detecting
> the DAX case as this patchset shows, but what about directio?
>
> (At this point Dave suggests on IRC that we just make a dio specific
> iomap_begin. That eliminates the flag, though it'd be a fairly similar
> function.)
As explained above, DIO is exaxtly the same as buffered I/O with
an extent size hint. We already handle it fine. DAX is almost
the same, keyed off a DAX check in lower level code to convert unwritten
extents and zero the blocks.
next prev parent reply other threads:[~2016-09-15 6:43 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-14 10:01 iomap based DAX path V2 Christoph Hellwig
2016-09-14 10:01 ` Christoph Hellwig
2016-09-14 10:01 ` [PATCH 03/12] dax: don't pass buffer_head to dax_insert_mapping Christoph Hellwig
2016-09-14 10:01 ` [PATCH 04/12] dax: don't pass buffer_head to copy_user_dax Christoph Hellwig
2016-09-14 10:01 ` [PATCH 05/12] dax: provide an iomap based dax read/write path Christoph Hellwig
[not found] ` <1473847291-18913-6-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-14 17:26 ` Ross Zwisler
2016-09-14 17:26 ` Ross Zwisler
2016-09-14 10:01 ` [PATCH 06/12] dax: provide an iomap based fault handler Christoph Hellwig
2016-09-14 17:27 ` Ross Zwisler
[not found] ` <20160914172717.GB30852-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-15 5:13 ` Christoph Hellwig
2016-09-15 5:13 ` Christoph Hellwig
2016-09-14 10:01 ` [PATCH 07/12] xfs: fix locking for DAX writes Christoph Hellwig
2016-09-14 10:01 ` [PATCH 08/12] xfs: take the ilock shared if possible in xfs_file_iomap_begin Christoph Hellwig
2016-09-14 10:01 ` [PATCH 09/12] xfs: refactor xfs_setfilesize Christoph Hellwig
2016-09-14 10:01 ` [PATCH 10/12] xfs: use iomap to implement DAX Christoph Hellwig
[not found] ` <1473847291-18913-11-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-14 17:32 ` Ross Zwisler
2016-09-14 17:32 ` Ross Zwisler
[not found] ` <20160914173247.GC30852-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-15 5:14 ` Christoph Hellwig
2016-09-15 5:14 ` Christoph Hellwig
2016-09-15 5:29 ` Darrick J. Wong
[not found] ` <20160915052933.GH9314-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-09-15 6:43 ` Christoph Hellwig [this message]
2016-09-15 6:43 ` Christoph Hellwig
2016-09-14 10:01 ` [PATCH 11/12] ext2: stop passing buffer_head to ext2_get_blocks Christoph Hellwig
[not found] ` <1473847291-18913-12-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-14 22:42 ` Ross Zwisler
2016-09-14 22:42 ` Ross Zwisler
[not found] ` <1473847291-18913-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-14 10:01 ` [PATCH 01/12] iomap: add IOMAP_F_NEW flag Christoph Hellwig
2016-09-14 10:01 ` Christoph Hellwig
2016-09-14 10:01 ` [PATCH 02/12] iomap: expose iomap_apply outside iomap.c Christoph Hellwig
2016-09-14 10:01 ` Christoph Hellwig
2016-09-14 10:01 ` [PATCH 12/12] ext2: use iomap to implement DAX Christoph Hellwig
2016-09-14 10:01 ` Christoph Hellwig
[not found] ` <1473847291-18913-13-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-14 22:51 ` Ross Zwisler
2016-09-14 22:51 ` Ross Zwisler
2016-09-15 5:14 ` Christoph Hellwig
-- strict thread matches above, loose matches on Subject: below --
2016-09-16 11:27 iomap based DAX path V3 Christoph Hellwig
[not found] ` <1474025234-13804-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-16 11:27 ` [PATCH 10/12] xfs: use iomap to implement DAX Christoph Hellwig
2016-09-16 11:27 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160915064320.GA7665@lst.de \
--to=hch-jcswghmuv9g@public.gmane.org \
--cc=darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org \
--cc=linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.