linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Matthew Wilcox <willy@linux.intel.com>
Cc: Toshi Kani <toshi.kani@hp.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v6 07/22] Replace the XIP page fault handler with the DAX page fault handler
Date: Mon, 3 Mar 2014 10:30:27 +1100	[thread overview]
Message-ID: <20140302233027.GR30131@dastard> (raw)
In-Reply-To: <20140228202031.GB12820@linux.intel.com>

On Fri, Feb 28, 2014 at 03:20:31PM -0500, Matthew Wilcox wrote:
> On Fri, Feb 28, 2014 at 10:49:31AM -0700, Toshi Kani wrote:
> > On Tue, 2014-02-25 at 09:18 -0500, Matthew Wilcox wrote:
> > > Instead of calling aops->get_xip_mem from the fault handler, the
> > > filesystem passes a get_block_t that is used to find the appropriate
> > > blocks.
> >  :
> > > +static int do_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
> > > +			get_block_t get_block)
> > > +{
> > > +	struct file *file = vma->vm_file;
> > > +	struct inode *inode = file_inode(file);
> > > +	struct address_space *mapping = file->f_mapping;
> > > +	struct buffer_head bh;
> > > +	unsigned long vaddr = (unsigned long)vmf->virtual_address;
> > > +	sector_t block;
> > > +	pgoff_t size;
> > > +	unsigned long pfn;
> > > +	int error;
> > > +
> > > +	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > +	if (vmf->pgoff >= size)
> > > +		return VM_FAULT_SIGBUS;
> > > +
> > > +	memset(&bh, 0, sizeof(bh));
> > > +	block = (sector_t)vmf->pgoff << (PAGE_SHIFT - inode->i_blkbits);
> > > +	bh.b_size = PAGE_SIZE;
> > > +	error = get_block(inode, block, &bh, 0);
> > > +	if (error || bh.b_size < PAGE_SIZE)
> > > +		return VM_FAULT_SIGBUS;
> > 
> > I am learning the code and have some questions.
> 
> Hi Toshi,
> 
> Glad to see you're looking at it.  Let me try to help ...
> 
> > The original code,
> > xip_file_fault(), jumps to found: and calls vm_insert_mixed() when
> > get_xip_mem(,,0,,) succeeded.  If get_xip_mem() returns -ENODATA, it
> > calls either get_xip_mem(,,1,,) or xip_sparse_page().  In this new
> > function, it looks to me that get_block(,,,0) returns 0 for both cases
> > (success and -ENODATA previously), which are dealt in the same way.  Is
> > that right?  If so, is there any reason for the change?
> 
> Yes, get_xip_mem() returned -ENODATA for a hole.  That was a suboptimal
> interface because filesystems are actually capable of returning more
> information than that, eg how long the hole is (ext4 *doesn't*, but I
> consider that to be a bug).
> 
> I don't get to decide what the get_block() interface looks like.  It's the
> standard way that the VFS calls back into the filesystem and has been
> around for probably close to twenty years at this point.  I'm still trying
> to understand exactly what the contract is for get_blocks() ... I have
> a document that I'm working on to try to explain it, but it's tough going!
> 
> > Also, isn't it
> > possible to call get_block(,,,1) even if get_block(,,,0) found a block?
> 
> The code in question looks like this:
> 
>         error = get_block(inode, block, &bh, 0);
>         if (error || bh.b_size < PAGE_SIZE)
>                 goto sigbus;
> 
>         if (!buffer_written(&bh) && !vmf->cow_page) {
>                 if (vmf->flags & FAULT_FLAG_WRITE) {
>                         error = get_block(inode, block, &bh, 1);
> 
> where buffer_written is defined as:
>         return buffer_mapped(bh) && !buffer_unwritten(bh);
> 
> Doing some boolean algebra, that's:
> 
> 	if (!buffer_mapped || buffer_unwritten)
> 
> In either case, we want to tell the filesystem that we're writing to
> this block.  At least, that's my current understanding of the get_block()
> interface.  I'm open to correction here!

I've got a rewritten version on this that doesn't require two calls
to get_block() that I wrote while prototyping the XFS code. It also
fixes all the misunderstandings about what get_block() actually does
and returns so it works correctly with XFS.

I need to port it forward to your new patch set (hopefully later
this week), so don't spend too much time trying to work out exactly
what this code needs to do...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2014-03-02 23:30 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-25 14:18 [PATCH v6 00/22] Support ext4 on NV-DIMMs Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 01/22] Fix XIP fault vs truncate race Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 02/22] Allow page fault handlers to perform the COW Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 03/22] axonram: Fix bug in direct_access Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 04/22] Change direct_access calling convention Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 05/22] Introduce IS_DAX(inode) Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 06/22] Replace XIP read and write with DAX I/O Matthew Wilcox
2014-03-11  0:32   ` Toshi Kani
2014-03-11 12:53     ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 07/22] Replace the XIP page fault handler with the DAX page fault handler Matthew Wilcox
2014-02-28 17:49   ` Toshi Kani
2014-02-28 20:20     ` Matthew Wilcox
2014-02-28 22:18       ` Toshi Kani
2014-03-02 23:30       ` Dave Chinner [this message]
2014-03-03 23:07         ` Ross Zwisler
2014-03-04  0:56           ` Dave Chinner
2014-03-20 19:38             ` Matthew Wilcox
2014-03-20 23:55               ` Dave Chinner
2014-02-25 14:18 ` [PATCH v6 08/22] Replace xip_truncate_page with dax_truncate_page Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 09/22] Remove mm/filemap_xip.c Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 10/22] Remove get_xip_mem Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 11/22] Replace ext2_clear_xip_target with dax_clear_blocks Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 12/22] ext2: Remove ext2_xip_verify_sb() Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 13/22] ext2: Remove ext2_use_xip Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 14/22] ext2: Remove xip.c and xip.h Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 15/22] Remove CONFIG_EXT2_FS_XIP and rename CONFIG_FS_XIP to CONFIG_FS_DAX Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 16/22] ext2: Remove ext2_aops_xip Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 17/22] Get rid of most mentions of XIP in ext2 Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 18/22] xip: Add xip_zero_page_range Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 19/22] ext4: Make ext4_block_zero_page_range static Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 20/22] ext4: Add DAX functionality Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 21/22] ext4: Fix typos Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 22/22] dax: Add reporting of major faults Matthew Wilcox
2014-02-26 15:07 ` [PATCH v6 23/22] Bugfixes Matthew Wilcox
2014-02-27 14:01 ` [PATCH v6 00/22] Support ext4 on NV-DIMMs Florian Weimer
2014-02-27 16:29   ` Matthew Wilcox
2014-02-27 16:36     ` Florian Weimer
2014-03-02  8:22 ` Pavel Machek
     [not found] ` <CF4DEE22.25C8F%matthew.r.wilcox@intel.com>
2014-03-18 18:45   ` [PATCH v6 20/22] ext4: Add DAX functionality Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140302233027.GR30131@dastard \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=toshi.kani@hp.com \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).