linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@linux.intel.com>,
	Toshi Kani <toshi.kani@hp.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v6 07/22] Replace the XIP page fault handler with the DAX page fault handler
Date: Mon, 3 Mar 2014 16:07:35 -0700 (MST)	[thread overview]
Message-ID: <alpine.OSX.2.00.1403031549110.34680@scrumpy> (raw)
In-Reply-To: <20140302233027.GR30131@dastard>

On Mon, 3 Mar 2014, Dave Chinner wrote:
> On Fri, Feb 28, 2014 at 03:20:31PM -0500, Matthew Wilcox wrote:
> > On Fri, Feb 28, 2014 at 10:49:31AM -0700, Toshi Kani wrote:
> > > The original code,
> > > xip_file_fault(), jumps to found: and calls vm_insert_mixed() when
> > > get_xip_mem(,,0,,) succeeded.  If get_xip_mem() returns -ENODATA, it
> > > calls either get_xip_mem(,,1,,) or xip_sparse_page().  In this new
> > > function, it looks to me that get_block(,,,0) returns 0 for both cases
> > > (success and -ENODATA previously), which are dealt in the same way.  Is
> > > that right?  If so, is there any reason for the change?
> > 
> > Yes, get_xip_mem() returned -ENODATA for a hole.  That was a suboptimal
> > interface because filesystems are actually capable of returning more
> > information than that, eg how long the hole is (ext4 *doesn't*, but I
> > consider that to be a bug).
> > 
> > I don't get to decide what the get_block() interface looks like.  It's the
> > standard way that the VFS calls back into the filesystem and has been
> > around for probably close to twenty years at this point.  I'm still trying
> > to understand exactly what the contract is for get_blocks() ... I have
> > a document that I'm working on to try to explain it, but it's tough going!
> > 
> > > Also, isn't it
> > > possible to call get_block(,,,1) even if get_block(,,,0) found a block?
> > 
> > The code in question looks like this:
> > 
> >         error = get_block(inode, block, &bh, 0);
> >         if (error || bh.b_size < PAGE_SIZE)
> >                 goto sigbus;
> > 
> >         if (!buffer_written(&bh) && !vmf->cow_page) {
> >                 if (vmf->flags & FAULT_FLAG_WRITE) {
> >                         error = get_block(inode, block, &bh, 1);
> > 
> > where buffer_written is defined as:
> >         return buffer_mapped(bh) && !buffer_unwritten(bh);
> > 
> > Doing some boolean algebra, that's:
> > 
> > 	if (!buffer_mapped || buffer_unwritten)
> > 
> > In either case, we want to tell the filesystem that we're writing to
> > this block.  At least, that's my current understanding of the get_block()
> > interface.  I'm open to correction here!
> 
> I've got a rewritten version on this that doesn't require two calls
> to get_block() that I wrote while prototyping the XFS code. It also
> fixes all the misunderstandings about what get_block() actually does
> and returns so it works correctly with XFS.
> 
> I need to port it forward to your new patch set (hopefully later
> this week), so don't spend too much time trying to work out exactly
> what this code needs to do...

Here is a writeup from Matthew Wilcox describing the get_block() interface.

He sent this to me before Dave sent out the latest mail in this thread. :)

Corrections and updates are very welcome.

- Ross

================

get_block_t is used by the VFS to ask filesystem to translate logical
blocks within a file to sectors on a block device.

typedef int (get_block_t)(struct inode *inode, sector_t iblock,
                        struct buffer_head *bh_result, int create);

get_block() must not be called simultaneously with the create flag set
for overlapping extents *** or is/was this a bug in ext2? ***

Despite the iblock argument having type sector_t, iblock is actually
in units of the file block size, not in units of 512-byte sectors.
iblock must not extend beyond i_size. *** um, looks like xfs permits
this ... ? ***

If there is no current mapping from the block to the media, one will be
created if 'create' is set to 1.  'create' should not be set to a value
other than '0' or '1'.

On entry, bh_result should have b_size set to the number of bytes that the
caller is interested in and b_state initialised to zero.  b_size should
be a multiple of the file block size.  On exit, bh_result describes a
physically contiguous extent starting at iblock.  b_size will not be
increased by get_block, but it may be decreased if the filesystem extent
is shorter than the extent requested.

If bh_result describes an extent that is allocated, then BH_Mapped will
be set, and b_bdev and b_blocknr will be set to indicate the physical
location on the media.  The filesystem may wish to use map_bh() in order
to set BH_Mapped and initialise b_bdev, b_blocknr and b_size.  If the
filesystem knows that the extent is read into memory (eg because it
decided to populate the page cache as part of its get_block operation),
it should set BH_Uptodate.

If bh_result describes a hole, the filesystem should clear BH_Mapped and
set BH_Uptodate.  It will not set b_bdev or b_blocknr, but it should set
b_size to indicate the length of the hole.  It may also opt to leave
bh_result untouched as described above.  If the block corresponds to
a hole, bh_result *may* be unmodified, but the VFS can optimise some
operations if the filesystem reports the length of the hole as described
below. *** or is this a bug in ext4? ***

If bh_result describes an extent which has data in the pagecache, but
that data has not yet had space allocated on the media (due to delayed
allocation), BH_Mapped, BH_Uptodate and BH_Delay will be set.  b_blocknr
is not set.

If bh_result describes an extent which is reserved for data which has not
yet been written, BH_Unwritten is set, b_bdev and b_blocknr are set.

Other flags that may be set include BH_New (to indicate that the buffer
was newly allocated and may need to be initialised), and BH_Boundary (to
indicate a physical discontinuity after this extent).

The filesystem may choose to set b_private.  Other fields in buffer_head
are legacy buffer-cache uses and will not be modified by get_block.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-03-03 23:07 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-25 14:18 [PATCH v6 00/22] Support ext4 on NV-DIMMs Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 01/22] Fix XIP fault vs truncate race Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 02/22] Allow page fault handlers to perform the COW Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 03/22] axonram: Fix bug in direct_access Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 04/22] Change direct_access calling convention Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 05/22] Introduce IS_DAX(inode) Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 06/22] Replace XIP read and write with DAX I/O Matthew Wilcox
2014-03-11  0:32   ` Toshi Kani
2014-03-11 12:53     ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 07/22] Replace the XIP page fault handler with the DAX page fault handler Matthew Wilcox
2014-02-28 17:49   ` Toshi Kani
2014-02-28 20:20     ` Matthew Wilcox
2014-02-28 22:18       ` Toshi Kani
2014-03-02 23:30       ` Dave Chinner
2014-03-03 23:07         ` Ross Zwisler [this message]
2014-03-04  0:56           ` Dave Chinner
2014-03-20 19:38             ` Matthew Wilcox
2014-03-20 23:55               ` Dave Chinner
2014-02-25 14:18 ` [PATCH v6 08/22] Replace xip_truncate_page with dax_truncate_page Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 09/22] Remove mm/filemap_xip.c Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 10/22] Remove get_xip_mem Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 11/22] Replace ext2_clear_xip_target with dax_clear_blocks Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 12/22] ext2: Remove ext2_xip_verify_sb() Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 13/22] ext2: Remove ext2_use_xip Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 14/22] ext2: Remove xip.c and xip.h Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 15/22] Remove CONFIG_EXT2_FS_XIP and rename CONFIG_FS_XIP to CONFIG_FS_DAX Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 16/22] ext2: Remove ext2_aops_xip Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 17/22] Get rid of most mentions of XIP in ext2 Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 18/22] xip: Add xip_zero_page_range Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 19/22] ext4: Make ext4_block_zero_page_range static Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 20/22] ext4: Add DAX functionality Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 21/22] ext4: Fix typos Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 22/22] dax: Add reporting of major faults Matthew Wilcox
2014-02-26 15:07 ` [PATCH v6 23/22] Bugfixes Matthew Wilcox
2014-02-27 14:01 ` [PATCH v6 00/22] Support ext4 on NV-DIMMs Florian Weimer
2014-02-27 16:29   ` Matthew Wilcox
2014-02-27 16:36     ` Florian Weimer
2014-03-02  8:22 ` Pavel Machek
     [not found] ` <CF4DEE22.25C8F%matthew.r.wilcox@intel.com>
2014-03-18 18:45   ` [PATCH v6 20/22] ext4: Add DAX functionality Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.OSX.2.00.1403031549110.34680@scrumpy \
    --to=ross.zwisler@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=toshi.kani@hp.com \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).