All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Williams, Dan J" <dan.j.williams@intel.com>
To: "jack@suse.cz" <jack@suse.cz>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jmoyer@redhat.com" <jmoyer@redhat.com>,
	"hch@lst.de" <hch@lst.de>, "axboe@fb.com" <axboe@fb.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"willy@linux.intel.com" <willy@linux.intel.com>,
	"ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"david@fromorbit.com" <david@fromorbit.com>
Subject: Re: [PATCH 5/5] block: enable dax for raw block devices
Date: Thu, 22 Oct 2015 16:05:46 +0000	[thread overview]
Message-ID: <1445529945.17208.4.camel@intel.com> (raw)
In-Reply-To: <20151022093549.GE14445@quack.suse.cz>

On Thu, 2015-10-22 at 11:35 +0200, Jan Kara wrote:
> On Thu 22-10-15 02:42:11, Dan Williams wrote:
> > If an application wants exclusive access to all of the persistent memory
> > provided by an NVDIMM namespace it can use this raw-block-dax facility
> > to forgo establishing a filesystem.  This capability is targeted
> > primarily to hypervisors wanting to provision persistent memory for
> > guests.
> > 
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Jeff Moyer <jmoyer@redhat.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Dave Chinner <david@fromorbit.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  fs/block_dev.c |   54 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 53 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 3255dcec96b4..c27cd1a21a13 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -1687,13 +1687,65 @@ static const struct address_space_operations def_blk_aops = {
> >  	.is_dirty_writeback = buffer_check_dirty_writeback,
> >  };
> >  
> > +#ifdef CONFIG_FS_DAX
> > +/*
> > + * In the raw block case we do not need to contend with truncation nor
> > + * unwritten file extents.  Without those concerns there is no need for
> > + * additional locking beyond the mmap_sem context that these routines
> > + * are already executing under.
> > + *
> > + * Note, there is no protection if the block device is dynamically
> > + * resized (partition grow/shrink) during a fault. A stable block device
> > + * size is already not enforced in the blkdev_direct_IO path.
> > + *
> > + * For DAX, it is the responsibility of the block device driver to
> > + * ensure the whole-disk device size is stable while requests are in
> > + * flight.
> > + *
> > + * Finally, these paths do not synchronize against freezing
> > + * (sb_start_pagefault(), etc...) since bdev_sops does not support
> > + * freezing.
> 
> Well, for devices freezing is handled directly in the block layer code
> (blk_stop_queue()) since there's no need to put some metadata structures
> into a consistent state. So the comment about bdev_sops is somewhat
> strange.

This text was aimed at the request from Ross to document the differences
vs the generic_file_mmap() path.  Is the following incremental change
more clear?

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 840acd4380d4..4ae8fa55bd1e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1702,9 +1702,15 @@ static const struct address_space_operations def_blk_aops = {
  * ensure the whole-disk device size is stable while requests are in
  * flight.
  *
- * Finally, these paths do not synchronize against freezing
- * (sb_start_pagefault(), etc...) since bdev_sops does not support
- * freezing.
+ * Finally, in contrast to the generic_file_mmap() path, there are no
+ * calls to sb_start_pagefault().  That is meant to synchronize write
+ * faults against requests to freeze the contents of the filesystem
+ * hosting vma->vm_file.  However, in the case of a block device special
+ * file, it is a 0-sized device node usually hosted on devtmpfs, i.e.
+ * nothing to do with the super_block for bdev_file_inode(vma->vm_file).
+ * We could call get_super() in this path to retrieve the right
+ * super_block, but the generic_file_mmap() path does not do this for
+ * the CONFIG_FS_DAX=n case.
  */
 static int blkdev_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {


WARNING: multiple messages have this Message-ID (diff)
From: "Williams, Dan J" <dan.j.williams@intel.com>
To: "jack@suse.cz" <jack@suse.cz>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jmoyer@redhat.com" <jmoyer@redhat.com>,
	"hch@lst.de" <hch@lst.de>, "axboe@fb.com" <axboe@fb.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	"willy@linux.intel.com" <willy@linux.intel.com>,
	"ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"david@fromorbit.com" <david@fromorbit.com>
Subject: Re: [PATCH 5/5] block: enable dax for raw block devices
Date: Thu, 22 Oct 2015 16:05:46 +0000	[thread overview]
Message-ID: <1445529945.17208.4.camel@intel.com> (raw)
In-Reply-To: <20151022093549.GE14445@quack.suse.cz>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3707 bytes --]

On Thu, 2015-10-22 at 11:35 +0200, Jan Kara wrote:
> On Thu 22-10-15 02:42:11, Dan Williams wrote:
> > If an application wants exclusive access to all of the persistent memory
> > provided by an NVDIMM namespace it can use this raw-block-dax facility
> > to forgo establishing a filesystem.  This capability is targeted
> > primarily to hypervisors wanting to provision persistent memory for
> > guests.
> > 
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Jeff Moyer <jmoyer@redhat.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Dave Chinner <david@fromorbit.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  fs/block_dev.c |   54 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 53 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 3255dcec96b4..c27cd1a21a13 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -1687,13 +1687,65 @@ static const struct address_space_operations def_blk_aops = {
> >  	.is_dirty_writeback = buffer_check_dirty_writeback,
> >  };
> >  
> > +#ifdef CONFIG_FS_DAX
> > +/*
> > + * In the raw block case we do not need to contend with truncation nor
> > + * unwritten file extents.  Without those concerns there is no need for
> > + * additional locking beyond the mmap_sem context that these routines
> > + * are already executing under.
> > + *
> > + * Note, there is no protection if the block device is dynamically
> > + * resized (partition grow/shrink) during a fault. A stable block device
> > + * size is already not enforced in the blkdev_direct_IO path.
> > + *
> > + * For DAX, it is the responsibility of the block device driver to
> > + * ensure the whole-disk device size is stable while requests are in
> > + * flight.
> > + *
> > + * Finally, these paths do not synchronize against freezing
> > + * (sb_start_pagefault(), etc...) since bdev_sops does not support
> > + * freezing.
> 
> Well, for devices freezing is handled directly in the block layer code
> (blk_stop_queue()) since there's no need to put some metadata structures
> into a consistent state. So the comment about bdev_sops is somewhat
> strange.

This text was aimed at the request from Ross to document the differences
vs the generic_file_mmap() path.  Is the following incremental change
more clear?

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 840acd4380d4..4ae8fa55bd1e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1702,9 +1702,15 @@ static const struct address_space_operations def_blk_aops = {
  * ensure the whole-disk device size is stable while requests are in
  * flight.
  *
- * Finally, these paths do not synchronize against freezing
- * (sb_start_pagefault(), etc...) since bdev_sops does not support
- * freezing.
+ * Finally, in contrast to the generic_file_mmap() path, there are no
+ * calls to sb_start_pagefault().  That is meant to synchronize write
+ * faults against requests to freeze the contents of the filesystem
+ * hosting vma->vm_file.  However, in the case of a block device special
+ * file, it is a 0-sized device node usually hosted on devtmpfs, i.e.
+ * nothing to do with the super_block for bdev_file_inode(vma->vm_file).
+ * We could call get_super() in this path to retrieve the right
+ * super_block, but the generic_file_mmap() path does not do this for
+ * the CONFIG_FS_DAX=n case.
  */
 static int blkdev_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

  reply	other threads:[~2015-10-22 16:05 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-22  6:41 [PATCH 0/5] block, dax: updates for 4.4 Dan Williams
2015-10-22  6:41 ` Dan Williams
2015-10-22  6:41 ` [PATCH 1/5] pmem, dax: clean up clear_pmem() Dan Williams
2015-10-22  6:41   ` Dan Williams
2015-10-22  6:41 ` [PATCH 2/5] dax: increase granularity of dax_clear_blocks() operations Dan Williams
2015-10-22  6:41   ` Dan Williams
2015-10-22  9:26   ` Jan Kara
2015-10-22  9:26     ` Jan Kara
2015-10-22  6:41 ` [PATCH 3/5] block, dax: fix lifetime of in-kernel dax mappings with dax_map_atomic() Dan Williams
2015-10-22  6:41   ` Dan Williams
2015-10-22  6:42 ` [PATCH 4/5] block: introduce file_bd_inode() Dan Williams
2015-10-22  6:42   ` Dan Williams
2015-10-22  9:45   ` Jan Kara
2015-10-22  9:45     ` Jan Kara
2015-10-22 15:41     ` Dan Williams
2015-10-22 15:41       ` Dan Williams
2015-10-22  6:42 ` [PATCH 5/5] block: enable dax for raw block devices Dan Williams
2015-10-22  6:42   ` Dan Williams
2015-10-22  9:35   ` Jan Kara
2015-10-22  9:35     ` Jan Kara
2015-10-22 16:05     ` Williams, Dan J [this message]
2015-10-22 16:05       ` Williams, Dan J
2015-10-22 21:08       ` Jan Kara
2015-10-22 21:08         ` Jan Kara
2015-10-22 23:41         ` Williams, Dan J
2015-10-22 23:41           ` Williams, Dan J
2015-10-24 12:21           ` Jan Kara
2015-10-24 12:21             ` Jan Kara
2015-10-23 23:32         ` Dan Williams
2015-10-23 23:32           ` Dan Williams
2015-10-24 14:49           ` Jan Kara
2015-10-24 14:49             ` Jan Kara
2015-10-25 21:22         ` Dave Chinner
2015-10-25 21:22           ` Dave Chinner
2015-10-26  2:48           ` Dan Williams
2015-10-26  2:48             ` Dan Williams
2015-10-26  6:23             ` Dave Chinner
2015-10-26  6:23               ` Dave Chinner
2015-10-26  7:20               ` Jan Kara
2015-10-26  7:20                 ` Jan Kara
2015-10-26  8:56               ` Dan Williams
2015-10-26  8:56                 ` Dan Williams
2015-10-26 22:19                 ` Dave Chinner
2015-10-26 22:19                   ` Dave Chinner
2015-10-27 22:55                   ` Ross Zwisler
2015-10-27 22:55                     ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1445529945.17208.4.camel@intel.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@fb.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.