From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@linux.intel.com>,
Toshi Kani <toshi.kani@hp.com>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v6 07/22] Replace the XIP page fault handler with the DAX page fault handler
Date: Mon, 3 Mar 2014 16:07:35 -0700 (MST) [thread overview]
Message-ID: <alpine.OSX.2.00.1403031549110.34680@scrumpy> (raw)
In-Reply-To: <20140302233027.GR30131@dastard>
On Mon, 3 Mar 2014, Dave Chinner wrote:
> On Fri, Feb 28, 2014 at 03:20:31PM -0500, Matthew Wilcox wrote:
> > On Fri, Feb 28, 2014 at 10:49:31AM -0700, Toshi Kani wrote:
> > > The original code,
> > > xip_file_fault(), jumps to found: and calls vm_insert_mixed() when
> > > get_xip_mem(,,0,,) succeeded. If get_xip_mem() returns -ENODATA, it
> > > calls either get_xip_mem(,,1,,) or xip_sparse_page(). In this new
> > > function, it looks to me that get_block(,,,0) returns 0 for both cases
> > > (success and -ENODATA previously), which are dealt in the same way. Is
> > > that right? If so, is there any reason for the change?
> >
> > Yes, get_xip_mem() returned -ENODATA for a hole. That was a suboptimal
> > interface because filesystems are actually capable of returning more
> > information than that, eg how long the hole is (ext4 *doesn't*, but I
> > consider that to be a bug).
> >
> > I don't get to decide what the get_block() interface looks like. It's the
> > standard way that the VFS calls back into the filesystem and has been
> > around for probably close to twenty years at this point. I'm still trying
> > to understand exactly what the contract is for get_blocks() ... I have
> > a document that I'm working on to try to explain it, but it's tough going!
> >
> > > Also, isn't it
> > > possible to call get_block(,,,1) even if get_block(,,,0) found a block?
> >
> > The code in question looks like this:
> >
> > error = get_block(inode, block, &bh, 0);
> > if (error || bh.b_size < PAGE_SIZE)
> > goto sigbus;
> >
> > if (!buffer_written(&bh) && !vmf->cow_page) {
> > if (vmf->flags & FAULT_FLAG_WRITE) {
> > error = get_block(inode, block, &bh, 1);
> >
> > where buffer_written is defined as:
> > return buffer_mapped(bh) && !buffer_unwritten(bh);
> >
> > Doing some boolean algebra, that's:
> >
> > if (!buffer_mapped || buffer_unwritten)
> >
> > In either case, we want to tell the filesystem that we're writing to
> > this block. At least, that's my current understanding of the get_block()
> > interface. I'm open to correction here!
>
> I've got a rewritten version on this that doesn't require two calls
> to get_block() that I wrote while prototyping the XFS code. It also
> fixes all the misunderstandings about what get_block() actually does
> and returns so it works correctly with XFS.
>
> I need to port it forward to your new patch set (hopefully later
> this week), so don't spend too much time trying to work out exactly
> what this code needs to do...
Here is a writeup from Matthew Wilcox describing the get_block() interface.
He sent this to me before Dave sent out the latest mail in this thread. :)
Corrections and updates are very welcome.
- Ross
================
get_block_t is used by the VFS to ask filesystem to translate logical
blocks within a file to sectors on a block device.
typedef int (get_block_t)(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create);
get_block() must not be called simultaneously with the create flag set
for overlapping extents *** or is/was this a bug in ext2? ***
Despite the iblock argument having type sector_t, iblock is actually
in units of the file block size, not in units of 512-byte sectors.
iblock must not extend beyond i_size. *** um, looks like xfs permits
this ... ? ***
If there is no current mapping from the block to the media, one will be
created if 'create' is set to 1. 'create' should not be set to a value
other than '0' or '1'.
On entry, bh_result should have b_size set to the number of bytes that the
caller is interested in and b_state initialised to zero. b_size should
be a multiple of the file block size. On exit, bh_result describes a
physically contiguous extent starting at iblock. b_size will not be
increased by get_block, but it may be decreased if the filesystem extent
is shorter than the extent requested.
If bh_result describes an extent that is allocated, then BH_Mapped will
be set, and b_bdev and b_blocknr will be set to indicate the physical
location on the media. The filesystem may wish to use map_bh() in order
to set BH_Mapped and initialise b_bdev, b_blocknr and b_size. If the
filesystem knows that the extent is read into memory (eg because it
decided to populate the page cache as part of its get_block operation),
it should set BH_Uptodate.
If bh_result describes a hole, the filesystem should clear BH_Mapped and
set BH_Uptodate. It will not set b_bdev or b_blocknr, but it should set
b_size to indicate the length of the hole. It may also opt to leave
bh_result untouched as described above. If the block corresponds to
a hole, bh_result *may* be unmodified, but the VFS can optimise some
operations if the filesystem reports the length of the hole as described
below. *** or is this a bug in ext4? ***
If bh_result describes an extent which has data in the pagecache, but
that data has not yet had space allocated on the media (due to delayed
allocation), BH_Mapped, BH_Uptodate and BH_Delay will be set. b_blocknr
is not set.
If bh_result describes an extent which is reserved for data which has not
yet been written, BH_Unwritten is set, b_bdev and b_blocknr are set.
Other flags that may be set include BH_New (to indicate that the buffer
was newly allocated and may need to be initialised), and BH_Boundary (to
indicate a physical discontinuity after this extent).
The filesystem may choose to set b_private. Other fields in buffer_head
are legacy buffer-cache uses and will not be modified by get_block.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@linux.intel.com>,
Toshi Kani <toshi.kani@hp.com>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v6 07/22] Replace the XIP page fault handler with the DAX page fault handler
Date: Mon, 3 Mar 2014 16:07:35 -0700 (MST) [thread overview]
Message-ID: <alpine.OSX.2.00.1403031549110.34680@scrumpy> (raw)
In-Reply-To: <20140302233027.GR30131@dastard>
On Mon, 3 Mar 2014, Dave Chinner wrote:
> On Fri, Feb 28, 2014 at 03:20:31PM -0500, Matthew Wilcox wrote:
> > On Fri, Feb 28, 2014 at 10:49:31AM -0700, Toshi Kani wrote:
> > > The original code,
> > > xip_file_fault(), jumps to found: and calls vm_insert_mixed() when
> > > get_xip_mem(,,0,,) succeeded. If get_xip_mem() returns -ENODATA, it
> > > calls either get_xip_mem(,,1,,) or xip_sparse_page(). In this new
> > > function, it looks to me that get_block(,,,0) returns 0 for both cases
> > > (success and -ENODATA previously), which are dealt in the same way. Is
> > > that right? If so, is there any reason for the change?
> >
> > Yes, get_xip_mem() returned -ENODATA for a hole. That was a suboptimal
> > interface because filesystems are actually capable of returning more
> > information than that, eg how long the hole is (ext4 *doesn't*, but I
> > consider that to be a bug).
> >
> > I don't get to decide what the get_block() interface looks like. It's the
> > standard way that the VFS calls back into the filesystem and has been
> > around for probably close to twenty years at this point. I'm still trying
> > to understand exactly what the contract is for get_blocks() ... I have
> > a document that I'm working on to try to explain it, but it's tough going!
> >
> > > Also, isn't it
> > > possible to call get_block(,,,1) even if get_block(,,,0) found a block?
> >
> > The code in question looks like this:
> >
> > error = get_block(inode, block, &bh, 0);
> > if (error || bh.b_size < PAGE_SIZE)
> > goto sigbus;
> >
> > if (!buffer_written(&bh) && !vmf->cow_page) {
> > if (vmf->flags & FAULT_FLAG_WRITE) {
> > error = get_block(inode, block, &bh, 1);
> >
> > where buffer_written is defined as:
> > return buffer_mapped(bh) && !buffer_unwritten(bh);
> >
> > Doing some boolean algebra, that's:
> >
> > if (!buffer_mapped || buffer_unwritten)
> >
> > In either case, we want to tell the filesystem that we're writing to
> > this block. At least, that's my current understanding of the get_block()
> > interface. I'm open to correction here!
>
> I've got a rewritten version on this that doesn't require two calls
> to get_block() that I wrote while prototyping the XFS code. It also
> fixes all the misunderstandings about what get_block() actually does
> and returns so it works correctly with XFS.
>
> I need to port it forward to your new patch set (hopefully later
> this week), so don't spend too much time trying to work out exactly
> what this code needs to do...
Here is a writeup from Matthew Wilcox describing the get_block() interface.
He sent this to me before Dave sent out the latest mail in this thread. :)
Corrections and updates are very welcome.
- Ross
================
get_block_t is used by the VFS to ask filesystem to translate logical
blocks within a file to sectors on a block device.
typedef int (get_block_t)(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create);
get_block() must not be called simultaneously with the create flag set
for overlapping extents *** or is/was this a bug in ext2? ***
Despite the iblock argument having type sector_t, iblock is actually
in units of the file block size, not in units of 512-byte sectors.
iblock must not extend beyond i_size. *** um, looks like xfs permits
this ... ? ***
If there is no current mapping from the block to the media, one will be
created if 'create' is set to 1. 'create' should not be set to a value
other than '0' or '1'.
On entry, bh_result should have b_size set to the number of bytes that the
caller is interested in and b_state initialised to zero. b_size should
be a multiple of the file block size. On exit, bh_result describes a
physically contiguous extent starting at iblock. b_size will not be
increased by get_block, but it may be decreased if the filesystem extent
is shorter than the extent requested.
If bh_result describes an extent that is allocated, then BH_Mapped will
be set, and b_bdev and b_blocknr will be set to indicate the physical
location on the media. The filesystem may wish to use map_bh() in order
to set BH_Mapped and initialise b_bdev, b_blocknr and b_size. If the
filesystem knows that the extent is read into memory (eg because it
decided to populate the page cache as part of its get_block operation),
it should set BH_Uptodate.
If bh_result describes a hole, the filesystem should clear BH_Mapped and
set BH_Uptodate. It will not set b_bdev or b_blocknr, but it should set
b_size to indicate the length of the hole. It may also opt to leave
bh_result untouched as described above. If the block corresponds to
a hole, bh_result *may* be unmodified, but the VFS can optimise some
operations if the filesystem reports the length of the hole as described
below. *** or is this a bug in ext4? ***
If bh_result describes an extent which has data in the pagecache, but
that data has not yet had space allocated on the media (due to delayed
allocation), BH_Mapped, BH_Uptodate and BH_Delay will be set. b_blocknr
is not set.
If bh_result describes an extent which is reserved for data which has not
yet been written, BH_Unwritten is set, b_bdev and b_blocknr are set.
Other flags that may be set include BH_New (to indicate that the buffer
was newly allocated and may need to be initialised), and BH_Boundary (to
indicate a physical discontinuity after this extent).
The filesystem may choose to set b_private. Other fields in buffer_head
are legacy buffer-cache uses and will not be modified by get_block.
next prev parent reply other threads:[~2014-03-03 23:07 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-25 14:18 [PATCH v6 00/22] Support ext4 on NV-DIMMs Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 01/22] Fix XIP fault vs truncate race Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 02/22] Allow page fault handlers to perform the COW Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 03/22] axonram: Fix bug in direct_access Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 04/22] Change direct_access calling convention Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 05/22] Introduce IS_DAX(inode) Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 06/22] Replace XIP read and write with DAX I/O Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-03-11 0:32 ` Toshi Kani
2014-03-11 0:32 ` Toshi Kani
2014-03-11 12:53 ` Matthew Wilcox
2014-03-11 12:53 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 07/22] Replace the XIP page fault handler with the DAX page fault handler Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-28 17:49 ` Toshi Kani
2014-02-28 17:49 ` Toshi Kani
2014-02-28 20:20 ` Matthew Wilcox
2014-02-28 20:20 ` Matthew Wilcox
2014-02-28 22:18 ` Toshi Kani
2014-02-28 22:18 ` Toshi Kani
2014-03-02 23:30 ` Dave Chinner
2014-03-02 23:30 ` Dave Chinner
2014-03-03 23:07 ` Ross Zwisler [this message]
2014-03-03 23:07 ` Ross Zwisler
2014-03-04 0:56 ` Dave Chinner
2014-03-04 0:56 ` Dave Chinner
2014-03-20 19:38 ` Matthew Wilcox
2014-03-20 19:38 ` Matthew Wilcox
2014-03-20 23:55 ` Dave Chinner
2014-03-20 23:55 ` Dave Chinner
2014-02-25 14:18 ` [PATCH v6 08/22] Replace xip_truncate_page with dax_truncate_page Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 09/22] Remove mm/filemap_xip.c Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 10/22] Remove get_xip_mem Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 11/22] Replace ext2_clear_xip_target with dax_clear_blocks Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 12/22] ext2: Remove ext2_xip_verify_sb() Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 13/22] ext2: Remove ext2_use_xip Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 14/22] ext2: Remove xip.c and xip.h Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 15/22] Remove CONFIG_EXT2_FS_XIP and rename CONFIG_FS_XIP to CONFIG_FS_DAX Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 16/22] ext2: Remove ext2_aops_xip Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 17/22] Get rid of most mentions of XIP in ext2 Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 18/22] xip: Add xip_zero_page_range Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 19/22] ext4: Make ext4_block_zero_page_range static Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 20/22] ext4: Add DAX functionality Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 21/22] ext4: Fix typos Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-25 14:18 ` [PATCH v6 22/22] dax: Add reporting of major faults Matthew Wilcox
2014-02-25 14:18 ` Matthew Wilcox
2014-02-26 15:07 ` [PATCH v6 23/22] Bugfixes Matthew Wilcox
2014-02-26 15:07 ` Matthew Wilcox
2014-02-27 14:01 ` [PATCH v6 00/22] Support ext4 on NV-DIMMs Florian Weimer
2014-02-27 14:01 ` Florian Weimer
2014-02-27 16:29 ` Matthew Wilcox
2014-02-27 16:29 ` Matthew Wilcox
2014-02-27 16:36 ` Florian Weimer
2014-02-27 16:36 ` Florian Weimer
2014-03-02 8:22 ` Pavel Machek
2014-03-02 8:22 ` Pavel Machek
[not found] ` <CF4DEE22.25C8F%matthew.r.wilcox@intel.com>
2014-03-18 18:45 ` [PATCH v6 20/22] ext4: Add DAX functionality Ross Zwisler
2014-03-18 18:45 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.OSX.2.00.1403031549110.34680@scrumpy \
--to=ross.zwisler@linux.intel.com \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.r.wilcox@intel.com \
--cc=toshi.kani@hp.com \
--cc=willy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.