linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] fs: cache dax_device lookup result
@ 2017-08-25  0:35 Dan Williams
  2017-08-25  0:35 ` [PATCH 1/2] xfs: " Dan Williams
  2017-08-25  0:35 ` [PATCH 2/2] ext2, ext4: " Dan Williams
  0 siblings, 2 replies; 7+ messages in thread
From: Dan Williams @ 2017-08-25  0:35 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: jack, Darrick J. Wong, linux-nvdimm, linux-kernel, Andreas Dilger,
	Jan Kara, Theodore Ts'o, hch

Christoph notes:

    I just noticed that we now do a fs_dax_get_by_host in every
    iomap_begin call for DAX.  This function iterates a list, does a
    string compared and igrab.  I really think we need to cache this in
    the superblock (possible even the fs superblock) similar to what we
    do for the block device.

Fix this up to cache the result of the dax_device lookup in 'struct
xfs_mount' for xfs and 'struct super_block' for ext2/ext4.

---

Dan Williams (2):
      xfs: cache dax_device lookup result
      ext2, ext4: cache dax_device lookup result


 fs/ext2/inode.c    |   11 +++++++----
 fs/ext4/inode.c    |   11 +++++++----
 fs/xfs/xfs_aops.c  |   24 ++++++++++++++++++++++++
 fs/xfs/xfs_aops.h  |    1 +
 fs/xfs/xfs_buf.h   |    1 +
 fs/xfs/xfs_iomap.c |    9 +--------
 include/linux/fs.h |    3 +++
 7 files changed, 44 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] xfs: cache dax_device lookup result
  2017-08-25  0:35 [PATCH 0/2] fs: cache dax_device lookup result Dan Williams
@ 2017-08-25  0:35 ` Dan Williams
  2017-08-25  5:32   ` Darrick J. Wong
  2017-08-25  7:01   ` Christoph Hellwig
  2017-08-25  0:35 ` [PATCH 2/2] ext2, ext4: " Dan Williams
  1 sibling, 2 replies; 7+ messages in thread
From: Dan Williams @ 2017-08-25  0:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, linux-nvdimm, jack, hch, Darrick J. Wong

The ->iomap_begin() operation is a hot path, so cache the
fs_dax_get_by_host() result to avoid the incurring the hash lookup
overhead.

Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/xfs/xfs_aops.c  |   24 ++++++++++++++++++++++++
 fs/xfs/xfs_aops.h  |    1 +
 fs/xfs/xfs_buf.h   |    1 +
 fs/xfs/xfs_iomap.c |    9 +--------
 4 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 6bf120bb1a17..d16e013e1b8c 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -80,6 +80,30 @@ xfs_find_bdev_for_inode(
 		return mp->m_ddev_targp->bt_bdev;
 }
 
+struct dax_device *
+xfs_find_daxdev_for_inode(
+	struct inode		*inode)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct block_device	*bdev;
+	xfs_buftarg_t		*bt;
+
+	if (XFS_IS_REALTIME_INODE(ip))
+		bt = mp->m_rtdev_targp;
+	else
+		bt = mp->m_ddev_targp;
+
+	bdev = bt->bt_bdev;
+	if (!blk_queue_dax(bdev->bd_queue))
+		return NULL;
+
+	if (!bt->bt_daxdev)
+		bt->bt_daxdev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
+
+	return bt->bt_daxdev;
+}
+
 /*
  * We're now finished for good with this page.  Update the page state via the
  * associated buffer_heads, paying attention to the start and end offsets that
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index cc174ec6c2fd..88c85ea63da0 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -59,5 +59,6 @@ int	xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
 
 extern void xfs_count_page_state(struct page *, int *, int *);
 extern struct block_device *xfs_find_bdev_for_inode(struct inode *);
+extern struct dax_device *xfs_find_daxdev_for_inode(struct inode *);
 
 #endif /* __XFS_AOPS_H__ */
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 20721261dae5..fc95a02f272a 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -108,6 +108,7 @@ typedef unsigned int xfs_buf_flags_t;
 typedef struct xfs_buftarg {
 	dev_t			bt_dev;
 	struct block_device	*bt_bdev;
+	struct dax_device	*bt_daxdev;
 	struct xfs_mount	*bt_mount;
 	unsigned int		bt_meta_sectorsize;
 	size_t			bt_meta_sectormask;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 813394c62849..3e8b9ec9e802 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -69,6 +69,7 @@ xfs_bmbt_to_iomap(
 	iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
 	iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
 	iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
+	iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
 }
 
 xfs_extlen_t
@@ -976,7 +977,6 @@ xfs_file_iomap_begin(
 	int			nimaps = 1, error = 0;
 	bool			shared = false, trimmed = false;
 	unsigned		lockmode;
-	struct block_device	*bdev;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
@@ -1087,13 +1087,6 @@ xfs_file_iomap_begin(
 
 	xfs_bmbt_to_iomap(ip, iomap, &imap);
 
-	/* optionally associate a dax device with the iomap bdev */
-	bdev = iomap->bdev;
-	if (blk_queue_dax(bdev->bd_queue))
-		iomap->dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
-	else
-		iomap->dax_dev = NULL;
-
 	if (shared)
 		iomap->flags |= IOMAP_F_SHARED;
 	return 0;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] ext2, ext4: cache dax_device lookup result
  2017-08-25  0:35 [PATCH 0/2] fs: cache dax_device lookup result Dan Williams
  2017-08-25  0:35 ` [PATCH 1/2] xfs: " Dan Williams
@ 2017-08-25  0:35 ` Dan Williams
  1 sibling, 0 replies; 7+ messages in thread
From: Dan Williams @ 2017-08-25  0:35 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Theodore Ts'o, linux-nvdimm, linux-kernel, Andreas Dilger,
	Jan Kara, jack, hch

The ->iomap_begin() operation is a hot path, so cache the
fs_dax_get_by_host() result to avoid the incurring the hash lookup
overhead.

Cc: Jan Kara <jack@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/ext2/inode.c    |   11 +++++++----
 fs/ext4/inode.c    |   11 +++++++----
 include/linux/fs.h |    3 +++
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 30163d007b2f..910e6d502137 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -801,6 +801,7 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		unsigned flags, struct iomap *iomap)
 {
 	struct block_device *bdev;
+	struct dax_device *dax_dev;
 	unsigned int blkbits = inode->i_blkbits;
 	unsigned long first_block = offset >> blkbits;
 	unsigned long max_blocks = (length + (1 << blkbits) - 1) >> blkbits;
@@ -817,10 +818,12 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	bdev = inode->i_sb->s_bdev;
 	iomap->bdev = bdev;
 	iomap->offset = (u64)first_block << blkbits;
-	if (blk_queue_dax(bdev->bd_queue))
-		iomap->dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
-	else
-		iomap->dax_dev = NULL;
+	dax_dev = inode->i_sb->s_daxdev;
+	if (blk_queue_dax(bdev->bd_queue) && !dax_dev) {
+		dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
+		inode->i_sb->s_daxdev = dax_dev;
+	}
+	iomap->dax_dev = dax_dev;
 
 	if (ret == 0) {
 		iomap->type = IOMAP_HOLE;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c774bdc22759..36a4bcab9e0c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3405,6 +3405,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 			    unsigned flags, struct iomap *iomap)
 {
 	struct block_device *bdev;
+	struct dax_device *dax_dev;
 	unsigned int blkbits = inode->i_blkbits;
 	unsigned long first_block = offset >> blkbits;
 	unsigned long last_block = (offset + length - 1) >> blkbits;
@@ -3475,10 +3476,12 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	iomap->flags = 0;
 	bdev = inode->i_sb->s_bdev;
 	iomap->bdev = bdev;
-	if (blk_queue_dax(bdev->bd_queue))
-		iomap->dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
-	else
-		iomap->dax_dev = NULL;
+	dax_dev = inode->i_sb->s_daxdev;
+	if (blk_queue_dax(bdev->bd_queue) && !dax_dev) {
+		dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
+		inode->i_sb->s_daxdev = dax_dev;
+	}
+	iomap->dax_dev = dax_dev;
 	iomap->offset = first_block << blkbits;
 
 	if (ret == 0) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6e1fd5d21248..0594867a172a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1334,6 +1334,9 @@ struct super_block {
 	struct hlist_bl_head	s_anon;		/* anonymous dentries for (nfs) exporting */
 	struct list_head	s_mounts;	/* list of mounts; _not_ for fs use */
 	struct block_device	*s_bdev;
+#ifdef CONFIG_FS_DAX
+	struct dax_device	*s_daxdev;	/* cached dax_device */
+#endif
 	struct backing_dev_info *s_bdi;
 	struct mtd_info		*s_mtd;
 	struct hlist_node	s_instances;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] xfs: cache dax_device lookup result
  2017-08-25  0:35 ` [PATCH 1/2] xfs: " Dan Williams
@ 2017-08-25  5:32   ` Darrick J. Wong
  2017-08-25  6:24     ` Dan Williams
  2017-08-25  7:01   ` Christoph Hellwig
  1 sibling, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2017-08-25  5:32 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-fsdevel, linux-kernel, linux-nvdimm, jack, hch

On Thu, Aug 24, 2017 at 05:35:48PM -0700, Dan Williams wrote:
> The ->iomap_begin() operation is a hot path, so cache the
> fs_dax_get_by_host() result to avoid the incurring the hash lookup
> overhead.

Just out of curiosity (and sorry if this has already been discussed to
death and I'm merely ignorant) but I was wondering why the daxdev isn't
simply attached to the block_device?  Is it not a 1:1 mapping?

--D

> 
> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
> Reported-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  fs/xfs/xfs_aops.c  |   24 ++++++++++++++++++++++++
>  fs/xfs/xfs_aops.h  |    1 +
>  fs/xfs/xfs_buf.h   |    1 +
>  fs/xfs/xfs_iomap.c |    9 +--------
>  4 files changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 6bf120bb1a17..d16e013e1b8c 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -80,6 +80,30 @@ xfs_find_bdev_for_inode(
>  		return mp->m_ddev_targp->bt_bdev;
>  }
>  
> +struct dax_device *
> +xfs_find_daxdev_for_inode(
> +	struct inode		*inode)
> +{
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct block_device	*bdev;
> +	xfs_buftarg_t		*bt;
> +
> +	if (XFS_IS_REALTIME_INODE(ip))
> +		bt = mp->m_rtdev_targp;
> +	else
> +		bt = mp->m_ddev_targp;
> +
> +	bdev = bt->bt_bdev;
> +	if (!blk_queue_dax(bdev->bd_queue))
> +		return NULL;
> +
> +	if (!bt->bt_daxdev)
> +		bt->bt_daxdev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
> +
> +	return bt->bt_daxdev;
> +}
> +
>  /*
>   * We're now finished for good with this page.  Update the page state via the
>   * associated buffer_heads, paying attention to the start and end offsets that
> diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
> index cc174ec6c2fd..88c85ea63da0 100644
> --- a/fs/xfs/xfs_aops.h
> +++ b/fs/xfs/xfs_aops.h
> @@ -59,5 +59,6 @@ int	xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
>  
>  extern void xfs_count_page_state(struct page *, int *, int *);
>  extern struct block_device *xfs_find_bdev_for_inode(struct inode *);
> +extern struct dax_device *xfs_find_daxdev_for_inode(struct inode *);
>  
>  #endif /* __XFS_AOPS_H__ */
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index 20721261dae5..fc95a02f272a 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -108,6 +108,7 @@ typedef unsigned int xfs_buf_flags_t;
>  typedef struct xfs_buftarg {
>  	dev_t			bt_dev;
>  	struct block_device	*bt_bdev;
> +	struct dax_device	*bt_daxdev;
>  	struct xfs_mount	*bt_mount;
>  	unsigned int		bt_meta_sectorsize;
>  	size_t			bt_meta_sectormask;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 813394c62849..3e8b9ec9e802 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -69,6 +69,7 @@ xfs_bmbt_to_iomap(
>  	iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
>  	iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
>  	iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
> +	iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
>  }
>  
>  xfs_extlen_t
> @@ -976,7 +977,6 @@ xfs_file_iomap_begin(
>  	int			nimaps = 1, error = 0;
>  	bool			shared = false, trimmed = false;
>  	unsigned		lockmode;
> -	struct block_device	*bdev;
>  
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
> @@ -1087,13 +1087,6 @@ xfs_file_iomap_begin(
>  
>  	xfs_bmbt_to_iomap(ip, iomap, &imap);
>  
> -	/* optionally associate a dax device with the iomap bdev */
> -	bdev = iomap->bdev;
> -	if (blk_queue_dax(bdev->bd_queue))
> -		iomap->dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
> -	else
> -		iomap->dax_dev = NULL;
> -
>  	if (shared)
>  		iomap->flags |= IOMAP_F_SHARED;
>  	return 0;
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] xfs: cache dax_device lookup result
  2017-08-25  5:32   ` Darrick J. Wong
@ 2017-08-25  6:24     ` Dan Williams
  2017-08-25  7:02       ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2017-08-25  6:24 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-fsdevel, linux-kernel@vger.kernel.org,
	linux-nvdimm@lists.01.org, Jan Kara, Christoph Hellwig

On Thu, Aug 24, 2017 at 10:32 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Thu, Aug 24, 2017 at 05:35:48PM -0700, Dan Williams wrote:
>> The ->iomap_begin() operation is a hot path, so cache the
>> fs_dax_get_by_host() result to avoid the incurring the hash lookup
>> overhead.
>
> Just out of curiosity (and sorry if this has already been discussed to
> death and I'm merely ignorant) but I was wondering why the daxdev isn't
> simply attached to the block_device?  Is it not a 1:1 mapping?

The motivation was to remove all dax details from the block layer.
However, we still have the QUEUE_DAX flag which defeats that goal.
Hmm, a dax_device is similar to a request_queue as a path to access
capacity, so I think it might make sense to just replace the queue
flag with a full ->dax_dev pointer hanging off the gendisk.

This would also fix the bug I just realized was in these patches
around matching dax_get_by_host() with put_dax(). In fact, if it's
acceptable to hang a dax_dev off a gendisk then we don't need
dax_get_by_host() at all.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] xfs: cache dax_device lookup result
  2017-08-25  0:35 ` [PATCH 1/2] xfs: " Dan Williams
  2017-08-25  5:32   ` Darrick J. Wong
@ 2017-08-25  7:01   ` Christoph Hellwig
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2017-08-25  7:01 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-fsdevel, linux-kernel, linux-nvdimm, jack, hch,
	Darrick J. Wong

> +	bdev = bt->bt_bdev;
> +	if (!blk_queue_dax(bdev->bd_queue))
> +		return NULL;
> +
> +	if (!bt->bt_daxdev)
> +		bt->bt_daxdev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
> +
> +	return bt->bt_daxdev;

Please do this at mount time.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] xfs: cache dax_device lookup result
  2017-08-25  6:24     ` Dan Williams
@ 2017-08-25  7:02       ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2017-08-25  7:02 UTC (permalink / raw)
  To: Dan Williams
  Cc: Darrick J. Wong, linux-fsdevel, linux-kernel@vger.kernel.org,
	linux-nvdimm@lists.01.org, Jan Kara, Christoph Hellwig

On Thu, Aug 24, 2017 at 11:24:10PM -0700, Dan Williams wrote:
> This would also fix the bug I just realized was in these patches
> around matching dax_get_by_host() with put_dax(). In fact, if it's
> acceptable to hang a dax_dev off a gendisk then we don't need
> dax_get_by_host() at all.

I want to get the block code out of the loop entirely.  I'm a little
overloaded with work at the moment, but my plan was to add a mount_dax
equivalent to mount_bdev, and add support to the XFS buffer cache and
log code to go straight to DAX.  For the buffer cache we'd still have
to double buffer to provide transaction gurantees, but for the log
formatting it directly to memory should also provide nice speedups.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-08-25  7:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-25  0:35 [PATCH 0/2] fs: cache dax_device lookup result Dan Williams
2017-08-25  0:35 ` [PATCH 1/2] xfs: " Dan Williams
2017-08-25  5:32   ` Darrick J. Wong
2017-08-25  6:24     ` Dan Williams
2017-08-25  7:02       ` Christoph Hellwig
2017-08-25  7:01   ` Christoph Hellwig
2017-08-25  0:35 ` [PATCH 2/2] ext2, ext4: " Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).