linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Matthew Wilcox <willy@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v7 06/22] Replace XIP read and write with DAX I/O
Date: Wed, 9 Apr 2014 22:55:29 +0200	[thread overview]
Message-ID: <20140409205529.GO32103@quack.suse.cz> (raw)
In-Reply-To: <20140409151908.GD5727@linux.intel.com>

On Wed 09-04-14 11:19:08, Matthew Wilcox wrote:
> On Wed, Apr 09, 2014 at 11:14:50AM +0200, Jan Kara wrote:
> > On Tue 08-04-14 16:21:02, Matthew Wilcox wrote:
> > > On Tue, Apr 08, 2014 at 07:56:00PM +0200, Jan Kara wrote:
> > > > > +static void dax_new_buf(void *addr, unsigned size, unsigned first,
> > > > > +					loff_t offset, loff_t end, int rw)
> > > > > +{
> > > > > +	loff_t final = end - offset + first; /* The final byte of the buffer */
> > > > > +	if (rw != WRITE) {
> > > > > +		memset(addr, 0, size);
> > > > > +		return;
> > > > > +	}
> > > >   It seems counterintuitive to zero out "on-disk" blocks (it seems you'd do
> > > > this for unwritten blocks) when reading from them. Presumably it could also
> > > > have undesired effects on endurance of persistent memory. Instead I'd expect
> > > > that you simply zero out user provided buffer the same way as you do it for
> > > > holes.
> > > 
> > > I think we have to zero it here, because the second time we call
> > > get_block() for a given block, it won't be BH_New any more, so we won't
> > > know that it's supposed to be zeroed.
> >   But how can you have BH_New buffer when you didn't ask get_blocks() to
> > create any block? That would be a bug in the get_blocks() implementation...
> > Or am I missing something?
> 
> Oh ... right.  So just to be clear, we're looking at the case where
> we're doing a read of a filesystem block which is BH_Unwritten, but
> isn't a hole ... so it's been allocated on storage and not yet written.
> That's already treated as a hole:
> 
>                         if (rw == WRITE) {
> ...
>                         } else {
>                                 hole = !buffer_written(bh);
>                         }
> 
> and dax_new_buf is only called in the !hole case.
  Ah, my bad. But then dax_new_buf() won't ever be called for rw != WRITE.
get_blocks() cannot ever return BH_New buffer when 'create' argument was 0.

> > > > > +	if ((flags & DIO_LOCKING) && (rw == READ)) {
> > > > > +		struct address_space *mapping = inode->i_mapping;
> > > > > +		mutex_lock(&inode->i_mutex);
> > > > > +		retval = filemap_write_and_wait_range(mapping, offset, end - 1);
> > > > > +		if (retval) {
> > > > > +			mutex_unlock(&inode->i_mutex);
> > > > > +			goto out;
> > > > > +		}
> > > >   Is there a reason for this? I'd assume DAX has no pages in pagecache...
> > > 
> > > There will be pages in the page cache for holes that we page faulted on.
> > > They must go!  :-)
> >   Well, but this will only writeback dirty pages and if I read the code
> > correctly those pages will never be dirty since dax_mkwrite() will replace
> > them. Or am I missing something?
> 
> In addition to writing back dirty pages, filemap_write_and_wait_range()
> will evict clean pages.  Unintuitive, I know, but it matches what the
> direct I/O path does.  Plus, if we fall back to buffered I/O for holes
> (see above), then this will do the right thing at that time.
  Ugh, I'm pretty certain filemap_write_and_wait_range() doesn't evict
anything ;). Direct IO path calls that function so that direct IO read
after buffered write returns the written data. In that case we don't evict
anything from page cache because direct IO read doesn't invalidate any
information we have cached. Only direct IO write does that and for that we
call invalidate_inode_pages2_range() after writing the pages. So I maintain
that what you do doesn't make sense to me. You might need to do some
invalidation of hole pages. But note that generic_file_direct_write() does
that for you and even though that isn't serialized in any way with page
faults which can instantiate the hole pages again, things should work out
fine for you since that function also invalidates the range again after
->direct_IO callback is done. So AFAICT you don't have to do anything
except writing some nice comment about this ;).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-04-09 20:55 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-23 19:08 [PATCH v7 00/22] Support ext4 on NV-DIMMs Matthew Wilcox
2014-03-23 19:08 ` [PATCH v7 01/22] Fix XIP fault vs truncate race Matthew Wilcox
2014-03-29 15:57   ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 02/22] Allow page fault handlers to perform the COW Matthew Wilcox
2014-04-08 16:34   ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 03/22] axonram: Fix bug in direct_access Matthew Wilcox
2014-03-29 16:22   ` Jan Kara
2014-04-02 19:24     ` Matthew Wilcox
2014-03-23 19:08 ` [PATCH v7 04/22] Change direct_access calling convention Matthew Wilcox
2014-03-29 16:30   ` Jan Kara
2014-04-02 19:27     ` Matthew Wilcox
2014-03-23 19:08 ` [PATCH v7 05/22] Introduce IS_DAX(inode) Matthew Wilcox
2014-04-08 15:32   ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 06/22] Replace XIP read and write with DAX I/O Matthew Wilcox
2014-04-08 17:56   ` Jan Kara
2014-04-08 20:21     ` Matthew Wilcox
2014-04-09  9:14       ` Jan Kara
2014-04-09 15:19         ` Matthew Wilcox
2014-04-09 20:55           ` Jan Kara [this message]
2014-04-13 18:05             ` Matthew Wilcox
2014-04-09 12:04   ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 07/22] Replace the XIP page fault handler with the DAX page fault handler Matthew Wilcox
2014-04-08 22:05   ` Jan Kara
2014-04-09 20:48     ` Matthew Wilcox
2014-04-09 21:12       ` Jan Kara
2014-04-13 11:21         ` Matthew Wilcox
2014-04-14 16:04           ` Jan Kara
2014-04-09 10:27   ` Jan Kara
2014-04-09 20:51     ` Matthew Wilcox
2014-04-09 21:43       ` Jan Kara
2014-04-13 18:03         ` Matthew Wilcox
2014-07-29 12:12         ` Matthew Wilcox
2014-07-29 21:04           ` Jan Kara
2014-07-29 21:23             ` Matthew Wilcox
2014-07-30  9:52               ` Jan Kara
2014-07-30 21:02                 ` Matthew Wilcox
2014-08-09 11:00                 ` Matthew Wilcox
2014-08-11  8:51                   ` Jan Kara
2014-08-11 14:13                     ` Matthew Wilcox
2014-08-11 14:35                       ` Jan Kara
2014-08-11 15:02                         ` Matthew Wilcox
2014-08-11 15:25                           ` Jan Kara
2014-05-21 20:35   ` Toshi Kani
2014-06-05 22:38     ` Toshi Kani
2014-03-23 19:08 ` [PATCH v7 08/22] Replace xip_truncate_page with dax_truncate_page Matthew Wilcox
2014-04-08 22:17   ` Jan Kara
2014-04-09  9:26     ` Jan Kara
2014-04-13 19:07       ` Matthew Wilcox
2014-03-23 19:08 ` [PATCH v7 09/22] Remove mm/filemap_xip.c Matthew Wilcox
2014-04-08 18:21   ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 10/22] Remove get_xip_mem Matthew Wilcox
2014-04-08 18:20   ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 11/22] Replace ext2_clear_xip_target with dax_clear_blocks Matthew Wilcox
2014-04-09  9:46   ` Jan Kara
2014-04-10 14:16     ` Matthew Wilcox
2014-04-10 18:31       ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 12/22] ext2: Remove ext2_xip_verify_sb() Matthew Wilcox
2014-04-09  9:52   ` Jan Kara
2014-04-10 14:22     ` Matthew Wilcox
2014-04-10 18:35       ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 13/22] ext2: Remove ext2_use_xip Matthew Wilcox
2014-04-09  9:55   ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 14/22] ext2: Remove xip.c and xip.h Matthew Wilcox
2014-04-09  9:59   ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 15/22] Remove CONFIG_EXT2_FS_XIP and rename CONFIG_FS_XIP to CONFIG_FS_DAX Matthew Wilcox
2014-04-09  9:59   ` Jan Kara
2014-04-10 14:23     ` Matthew Wilcox
2014-03-23 19:08 ` [PATCH v7 16/22] ext2: Remove ext2_aops_xip Matthew Wilcox
2014-04-09 10:02   ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 17/22] Get rid of most mentions of XIP in ext2 Matthew Wilcox
2014-04-09 10:04   ` Jan Kara
2014-04-10 14:26     ` Matthew Wilcox
2014-04-10 18:40       ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 18/22] xip: Add xip_zero_page_range Matthew Wilcox
2014-04-09 10:15   ` Jan Kara
2014-04-10 14:27     ` Matthew Wilcox
2014-04-10 18:43       ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 19/22] ext4: Make ext4_block_zero_page_range static Matthew Wilcox
2014-03-24 19:11   ` tytso
2014-03-23 19:08 ` [PATCH v7 20/22] ext4: Add DAX functionality Matthew Wilcox
2014-04-09 12:17   ` Jan Kara
2014-03-23 19:08 ` [PATCH v7 21/22] ext4: Fix typos Matthew Wilcox
2014-03-24 19:16   ` tytso
2014-03-23 19:08 ` [PATCH v7 22/22] brd: Rename XIP to DAX Matthew Wilcox
2014-04-09 10:07   ` Jan Kara
2014-05-18 14:58 ` [PATCH v7 00/22] Support ext4 on NV-DIMMs Boaz Harrosh
2014-05-18 23:24   ` Matthew Wilcox
2014-06-17 18:11 ` Boaz Harrosh
2014-06-17 18:19   ` Matthew Wilcox
2014-06-17 18:39     ` Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140409205529.GO32103@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).