From: Boaz Harrosh <openosd@gmail.com>
To: Dave Chinner <david@fromorbit.com>,
Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Ross Zwisler <ross.zwisler@linux.intel.com>,
willy@linux.intel.com
Subject: Re: [PATCH v10 20/21] ext4: Add DAX functionality
Date: Wed, 10 Sep 2014 19:49:40 +0300 [thread overview]
Message-ID: <54108124.9030707@gmail.com> (raw)
In-Reply-To: <20140903111302.GG20473@dastard>
On 09/03/2014 02:13 PM, Dave Chinner wrote:
<>
>
> When direct IO fails ext4 falls back to buffered IO, right? And
> dax_do_io() can return partial writes, yes?
>
There is no buffered writes with DAX. .I.E buffered writes are always
direct as well. (No page cache)
> So that means if you get, say, ENOSPC part way through a DAX write,
> ext4 can start dirtying the page cache from
> __generic_file_write_iter() because the DAX write didn't wholly
> complete? And say this ENOSPC races with space being freed from
> another inode, then the buffered write will succeed and we'll end up
> with coherency issues, right?
>
> This is not an idle question - XFS if firing asserts all over the
> place when doing ENOSPC testing because DAX is returning partial
> writes and the XFS direct IO code is expecting them to either wholly
> complete or wholly fail. I can make the DAX variant do allow partial
> writes, but I'm not going to add a useless fallback to buffered IO
> for XFS when the (fully featured) direct allocation fails.
>
Right, no fall back. Because a fallback is just a retry, because in any
way DAX assumes there is never a page_cache_page for a written data
> Indeed, I note that in the dax_fault code, any page found in the
> page cache is explicitly removed and released, and the direct mapped
> block replaces that page in the vma. IOWs, this code expects pages
> to be clean as we're only supposed to have regions covered by holes
> using cached pages (dax_load_hole()).
Exactly, page_cache_page are only/always "regions covered by holes"
Once there is a real block allocated for an offset it will be directly
mapped to the vm without a page_cache_page.
> So if we've done a buffered
> write, we're going to toss out dirty pages the moment there is a
> page fault on the range and map the unmodified backing store in
> instead.
>
No! There is never "buffered write" with DAX. That is: there is never
a page_cache_page that holds data which will belong to the storage
later. DAX means zero-page-cache
> That just seems wrong. Maybe I've forgotten something, but this
> looks like a wart that we don't need and shouldn't bake into this
> interface as both ext4 and XFS can allocate into holes and extend
> files from from the direct IO interfaces. Of course, correct me if
> I'm wrong about ext4 capabilities...
>
Yes you have misread the patchset, all writes are always done directly
to bdev->direct_access(..) memory *never* via a copy to page_cache.
Currently The only existence of radix-tree pages is for ZERO pages that
cover holes, which get thrown out as clean or COWed on mkwrite
BTW Matthew: It took me a while to figure out the VFS/VMA api but
I managed to map a single ZERO page to all holes and COW them to
real blocks on mkwrite. It needed a combination of flags but the
main trick is that at mkwrite I do:
/* our zero page doesn't really hold the correct offset to the file in
* page->index so vmf->pgoff is incorrect, lets fix that */
vmf->pgoff = vma->vm_pgoff + (((unsigned long)vmf->virtual_address -
vma->vm_start) >> PAGE_SHIFT);
/* call fault handler to get a real page for writing */
ret = _xip_file_fault(vma, vmf);
/* invalidate all other mappings to that location */
unmap_mapping_range(mapping, vmf->pgoff << PAGE_SHIFT, PAGE_SIZE, 1);
/* mkwrite must lock the original page and return VM_FAULT_LOCKED */
if (ret == VM_FAULT_NOPAGE) {
lock_page(m1fs_zero_page);
ret = VM_FAULT_LOCKED;
}
return ret;
At _xip_file_fault() also called from .fault I do in the case of a hole:
if (!(vmf->flags & FAULT_FLAG_WRITE)) {
...
block = _find_data_block(inode, vmf->pgoff);
if (!block) {
vmf->page = g_zero_page;
err = vm_insert_page(vma,
(unsigned long)vmf->virtual_address,
vmf->page);
goto after_insert;
}
} else {
Above g_zero_page is my own global zero page, PAGE_ZERO will not work.
_find_data_block() is like your get_buffer but only for the read case,
the write case uses a different _get_block_create().
Please tell me if it is interesting for you? I can try to patch your DAX
patchset to do the same. This can always be done later as an optimization.
> Cheers,
> Dave.
>
Thanks
Boaz
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-09-10 16:49 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-27 3:45 [PATCH v10 00/21] Support ext4 on NV-DIMMs Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 01/21] axonram: Fix bug in direct_access Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 02/21] Change direct_access calling convention Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 03/21] Fix XIP fault vs truncate race Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 04/21] Allow page fault handlers to perform the COW Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 05/21] Introduce IS_DAX(inode) Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 06/21] Add copy_to_iter(), copy_from_iter() and iov_iter_zero() Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 07/21] Replace XIP read and write with DAX I/O Matthew Wilcox
2014-09-14 14:11 ` Boaz Harrosh
2014-08-27 3:45 ` [PATCH v10 08/21] Replace ext2_clear_xip_target with dax_clear_blocks Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 09/21] Replace the XIP page fault handler with the DAX page fault handler Matthew Wilcox
2014-09-03 7:47 ` Dave Chinner
2014-09-10 15:23 ` Matthew Wilcox
2014-09-11 3:09 ` Dave Chinner
2014-09-24 15:43 ` Matthew Wilcox
2014-09-25 1:01 ` Dave Chinner
2014-08-27 3:45 ` [PATCH v10 10/21] Replace xip_truncate_page with dax_truncate_page Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 11/21] Replace XIP documentation with DAX documentation Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 12/21] Remove get_xip_mem Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 13/21] ext2: Remove ext2_xip_verify_sb() Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 14/21] ext2: Remove ext2_use_xip Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 15/21] ext2: Remove xip.c and xip.h Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 16/21] Remove CONFIG_EXT2_FS_XIP and rename CONFIG_FS_XIP to CONFIG_FS_DAX Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 17/21] ext2: Remove ext2_aops_xip Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 18/21] Get rid of most mentions of XIP in ext2 Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 19/21] xip: Add xip_zero_page_range Matthew Wilcox
2014-09-03 9:21 ` Dave Chinner
2014-09-04 21:08 ` Matthew Wilcox
2014-09-04 21:36 ` Theodore Ts'o
2014-09-08 18:59 ` Matthew Wilcox
2014-08-27 3:45 ` [PATCH v10 20/21] ext4: Add DAX functionality Matthew Wilcox
2014-09-03 11:13 ` Dave Chinner
2014-09-10 16:49 ` Boaz Harrosh [this message]
2014-09-11 4:38 ` Dave Chinner
2014-09-14 12:25 ` Boaz Harrosh
2014-09-15 6:15 ` Dave Chinner
2014-09-15 9:41 ` Boaz Harrosh
2014-08-27 3:45 ` [PATCH v10 21/21] brd: Rename XIP to DAX Matthew Wilcox
2014-08-27 20:06 ` [PATCH v10 00/21] Support ext4 on NV-DIMMs Andrew Morton
2014-08-27 21:12 ` Matthew Wilcox
2014-08-27 21:46 ` Andrew Morton
2014-08-28 1:30 ` Andy Lutomirski
2014-08-28 16:50 ` Matthew Wilcox
2014-08-28 15:45 ` Matthew Wilcox
2014-08-27 21:22 ` Christoph Lameter
2014-08-27 21:30 ` Andrew Morton
2014-08-27 23:04 ` One Thousand Gnomes
2014-08-28 7:17 ` Dave Chinner
2014-08-30 23:11 ` Christian Stroetmann
2014-08-28 8:08 ` Boaz Harrosh
2014-08-28 22:09 ` Zwisler, Ross
2014-09-03 12:05 ` [PATCH 1/1] xfs: add DAX support Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54108124.9030707@gmail.com \
--to=openosd@gmail.com \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.r.wilcox@intel.com \
--cc=ross.zwisler@linux.intel.com \
--cc=willy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).