* [PATCH 0/2] fs: cache dax_device lookup result @ 2017-08-25 0:35 Dan Williams 2017-08-25 0:35 ` [PATCH 1/2] xfs: " Dan Williams 2017-08-25 0:35 ` [PATCH 2/2] ext2, ext4: " Dan Williams 0 siblings, 2 replies; 7+ messages in thread From: Dan Williams @ 2017-08-25 0:35 UTC (permalink / raw) To: linux-fsdevel Cc: jack, Darrick J. Wong, linux-nvdimm, linux-kernel, Andreas Dilger, Jan Kara, Theodore Ts'o, hch Christoph notes: I just noticed that we now do a fs_dax_get_by_host in every iomap_begin call for DAX. This function iterates a list, does a string compared and igrab. I really think we need to cache this in the superblock (possible even the fs superblock) similar to what we do for the block device. Fix this up to cache the result of the dax_device lookup in 'struct xfs_mount' for xfs and 'struct super_block' for ext2/ext4. --- Dan Williams (2): xfs: cache dax_device lookup result ext2, ext4: cache dax_device lookup result fs/ext2/inode.c | 11 +++++++---- fs/ext4/inode.c | 11 +++++++---- fs/xfs/xfs_aops.c | 24 ++++++++++++++++++++++++ fs/xfs/xfs_aops.h | 1 + fs/xfs/xfs_buf.h | 1 + fs/xfs/xfs_iomap.c | 9 +-------- include/linux/fs.h | 3 +++ 7 files changed, 44 insertions(+), 16 deletions(-) ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] xfs: cache dax_device lookup result 2017-08-25 0:35 [PATCH 0/2] fs: cache dax_device lookup result Dan Williams @ 2017-08-25 0:35 ` Dan Williams 2017-08-25 5:32 ` Darrick J. Wong 2017-08-25 7:01 ` Christoph Hellwig 2017-08-25 0:35 ` [PATCH 2/2] ext2, ext4: " Dan Williams 1 sibling, 2 replies; 7+ messages in thread From: Dan Williams @ 2017-08-25 0:35 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-kernel, linux-nvdimm, jack, hch, Darrick J. Wong The ->iomap_begin() operation is a hot path, so cache the fs_dax_get_by_host() result to avoid the incurring the hash lookup overhead. Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- fs/xfs/xfs_aops.c | 24 ++++++++++++++++++++++++ fs/xfs/xfs_aops.h | 1 + fs/xfs/xfs_buf.h | 1 + fs/xfs/xfs_iomap.c | 9 +-------- 4 files changed, 27 insertions(+), 8 deletions(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 6bf120bb1a17..d16e013e1b8c 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -80,6 +80,30 @@ xfs_find_bdev_for_inode( return mp->m_ddev_targp->bt_bdev; } +struct dax_device * +xfs_find_daxdev_for_inode( + struct inode *inode) +{ + struct xfs_inode *ip = XFS_I(inode); + struct xfs_mount *mp = ip->i_mount; + struct block_device *bdev; + xfs_buftarg_t *bt; + + if (XFS_IS_REALTIME_INODE(ip)) + bt = mp->m_rtdev_targp; + else + bt = mp->m_ddev_targp; + + bdev = bt->bt_bdev; + if (!blk_queue_dax(bdev->bd_queue)) + return NULL; + + if (!bt->bt_daxdev) + bt->bt_daxdev = fs_dax_get_by_host(bdev->bd_disk->disk_name); + + return bt->bt_daxdev; +} + /* * We're now finished for good with this page. Update the page state via the * associated buffer_heads, paying attention to the start and end offsets that diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h index cc174ec6c2fd..88c85ea63da0 100644 --- a/fs/xfs/xfs_aops.h +++ b/fs/xfs/xfs_aops.h @@ -59,5 +59,6 @@ int xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size); extern void xfs_count_page_state(struct page *, int *, int *); extern struct block_device *xfs_find_bdev_for_inode(struct inode *); +extern struct dax_device *xfs_find_daxdev_for_inode(struct inode *); #endif /* __XFS_AOPS_H__ */ diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h index 20721261dae5..fc95a02f272a 100644 --- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -108,6 +108,7 @@ typedef unsigned int xfs_buf_flags_t; typedef struct xfs_buftarg { dev_t bt_dev; struct block_device *bt_bdev; + struct dax_device *bt_daxdev; struct xfs_mount *bt_mount; unsigned int bt_meta_sectorsize; size_t bt_meta_sectormask; diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 813394c62849..3e8b9ec9e802 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -69,6 +69,7 @@ xfs_bmbt_to_iomap( iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff); iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount); iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip)); + iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip)); } xfs_extlen_t @@ -976,7 +977,6 @@ xfs_file_iomap_begin( int nimaps = 1, error = 0; bool shared = false, trimmed = false; unsigned lockmode; - struct block_device *bdev; if (XFS_FORCED_SHUTDOWN(mp)) return -EIO; @@ -1087,13 +1087,6 @@ xfs_file_iomap_begin( xfs_bmbt_to_iomap(ip, iomap, &imap); - /* optionally associate a dax device with the iomap bdev */ - bdev = iomap->bdev; - if (blk_queue_dax(bdev->bd_queue)) - iomap->dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name); - else - iomap->dax_dev = NULL; - if (shared) iomap->flags |= IOMAP_F_SHARED; return 0; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] xfs: cache dax_device lookup result 2017-08-25 0:35 ` [PATCH 1/2] xfs: " Dan Williams @ 2017-08-25 5:32 ` Darrick J. Wong 2017-08-25 6:24 ` Dan Williams 2017-08-25 7:01 ` Christoph Hellwig 1 sibling, 1 reply; 7+ messages in thread From: Darrick J. Wong @ 2017-08-25 5:32 UTC (permalink / raw) To: Dan Williams; +Cc: linux-fsdevel, linux-kernel, linux-nvdimm, jack, hch On Thu, Aug 24, 2017 at 05:35:48PM -0700, Dan Williams wrote: > The ->iomap_begin() operation is a hot path, so cache the > fs_dax_get_by_host() result to avoid the incurring the hash lookup > overhead. Just out of curiosity (and sorry if this has already been discussed to death and I'm merely ignorant) but I was wondering why the daxdev isn't simply attached to the block_device? Is it not a 1:1 mapping? --D > > Cc: "Darrick J. Wong" <darrick.wong@oracle.com> > Reported-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > fs/xfs/xfs_aops.c | 24 ++++++++++++++++++++++++ > fs/xfs/xfs_aops.h | 1 + > fs/xfs/xfs_buf.h | 1 + > fs/xfs/xfs_iomap.c | 9 +-------- > 4 files changed, 27 insertions(+), 8 deletions(-) > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 6bf120bb1a17..d16e013e1b8c 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -80,6 +80,30 @@ xfs_find_bdev_for_inode( > return mp->m_ddev_targp->bt_bdev; > } > > +struct dax_device * > +xfs_find_daxdev_for_inode( > + struct inode *inode) > +{ > + struct xfs_inode *ip = XFS_I(inode); > + struct xfs_mount *mp = ip->i_mount; > + struct block_device *bdev; > + xfs_buftarg_t *bt; > + > + if (XFS_IS_REALTIME_INODE(ip)) > + bt = mp->m_rtdev_targp; > + else > + bt = mp->m_ddev_targp; > + > + bdev = bt->bt_bdev; > + if (!blk_queue_dax(bdev->bd_queue)) > + return NULL; > + > + if (!bt->bt_daxdev) > + bt->bt_daxdev = fs_dax_get_by_host(bdev->bd_disk->disk_name); > + > + return bt->bt_daxdev; > +} > + > /* > * We're now finished for good with this page. Update the page state via the > * associated buffer_heads, paying attention to the start and end offsets that > diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h > index cc174ec6c2fd..88c85ea63da0 100644 > --- a/fs/xfs/xfs_aops.h > +++ b/fs/xfs/xfs_aops.h > @@ -59,5 +59,6 @@ int xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size); > > extern void xfs_count_page_state(struct page *, int *, int *); > extern struct block_device *xfs_find_bdev_for_inode(struct inode *); > +extern struct dax_device *xfs_find_daxdev_for_inode(struct inode *); > > #endif /* __XFS_AOPS_H__ */ > diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h > index 20721261dae5..fc95a02f272a 100644 > --- a/fs/xfs/xfs_buf.h > +++ b/fs/xfs/xfs_buf.h > @@ -108,6 +108,7 @@ typedef unsigned int xfs_buf_flags_t; > typedef struct xfs_buftarg { > dev_t bt_dev; > struct block_device *bt_bdev; > + struct dax_device *bt_daxdev; > struct xfs_mount *bt_mount; > unsigned int bt_meta_sectorsize; > size_t bt_meta_sectormask; > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index 813394c62849..3e8b9ec9e802 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -69,6 +69,7 @@ xfs_bmbt_to_iomap( > iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff); > iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount); > iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip)); > + iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip)); > } > > xfs_extlen_t > @@ -976,7 +977,6 @@ xfs_file_iomap_begin( > int nimaps = 1, error = 0; > bool shared = false, trimmed = false; > unsigned lockmode; > - struct block_device *bdev; > > if (XFS_FORCED_SHUTDOWN(mp)) > return -EIO; > @@ -1087,13 +1087,6 @@ xfs_file_iomap_begin( > > xfs_bmbt_to_iomap(ip, iomap, &imap); > > - /* optionally associate a dax device with the iomap bdev */ > - bdev = iomap->bdev; > - if (blk_queue_dax(bdev->bd_queue)) > - iomap->dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name); > - else > - iomap->dax_dev = NULL; > - > if (shared) > iomap->flags |= IOMAP_F_SHARED; > return 0; > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] xfs: cache dax_device lookup result 2017-08-25 5:32 ` Darrick J. Wong @ 2017-08-25 6:24 ` Dan Williams 2017-08-25 7:02 ` Christoph Hellwig 0 siblings, 1 reply; 7+ messages in thread From: Dan Williams @ 2017-08-25 6:24 UTC (permalink / raw) To: Darrick J. Wong Cc: linux-fsdevel, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, Jan Kara, Christoph Hellwig On Thu, Aug 24, 2017 at 10:32 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote: > On Thu, Aug 24, 2017 at 05:35:48PM -0700, Dan Williams wrote: >> The ->iomap_begin() operation is a hot path, so cache the >> fs_dax_get_by_host() result to avoid the incurring the hash lookup >> overhead. > > Just out of curiosity (and sorry if this has already been discussed to > death and I'm merely ignorant) but I was wondering why the daxdev isn't > simply attached to the block_device? Is it not a 1:1 mapping? The motivation was to remove all dax details from the block layer. However, we still have the QUEUE_DAX flag which defeats that goal. Hmm, a dax_device is similar to a request_queue as a path to access capacity, so I think it might make sense to just replace the queue flag with a full ->dax_dev pointer hanging off the gendisk. This would also fix the bug I just realized was in these patches around matching dax_get_by_host() with put_dax(). In fact, if it's acceptable to hang a dax_dev off a gendisk then we don't need dax_get_by_host() at all. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] xfs: cache dax_device lookup result 2017-08-25 6:24 ` Dan Williams @ 2017-08-25 7:02 ` Christoph Hellwig 0 siblings, 0 replies; 7+ messages in thread From: Christoph Hellwig @ 2017-08-25 7:02 UTC (permalink / raw) To: Dan Williams Cc: Darrick J. Wong, linux-fsdevel, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, Jan Kara, Christoph Hellwig On Thu, Aug 24, 2017 at 11:24:10PM -0700, Dan Williams wrote: > This would also fix the bug I just realized was in these patches > around matching dax_get_by_host() with put_dax(). In fact, if it's > acceptable to hang a dax_dev off a gendisk then we don't need > dax_get_by_host() at all. I want to get the block code out of the loop entirely. I'm a little overloaded with work at the moment, but my plan was to add a mount_dax equivalent to mount_bdev, and add support to the XFS buffer cache and log code to go straight to DAX. For the buffer cache we'd still have to double buffer to provide transaction gurantees, but for the log formatting it directly to memory should also provide nice speedups. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] xfs: cache dax_device lookup result 2017-08-25 0:35 ` [PATCH 1/2] xfs: " Dan Williams 2017-08-25 5:32 ` Darrick J. Wong @ 2017-08-25 7:01 ` Christoph Hellwig 1 sibling, 0 replies; 7+ messages in thread From: Christoph Hellwig @ 2017-08-25 7:01 UTC (permalink / raw) To: Dan Williams Cc: linux-fsdevel, linux-kernel, linux-nvdimm, jack, hch, Darrick J. Wong > + bdev = bt->bt_bdev; > + if (!blk_queue_dax(bdev->bd_queue)) > + return NULL; > + > + if (!bt->bt_daxdev) > + bt->bt_daxdev = fs_dax_get_by_host(bdev->bd_disk->disk_name); > + > + return bt->bt_daxdev; Please do this at mount time. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] ext2, ext4: cache dax_device lookup result 2017-08-25 0:35 [PATCH 0/2] fs: cache dax_device lookup result Dan Williams 2017-08-25 0:35 ` [PATCH 1/2] xfs: " Dan Williams @ 2017-08-25 0:35 ` Dan Williams 1 sibling, 0 replies; 7+ messages in thread From: Dan Williams @ 2017-08-25 0:35 UTC (permalink / raw) To: linux-fsdevel Cc: Theodore Ts'o, linux-nvdimm, linux-kernel, Andreas Dilger, Jan Kara, jack, hch The ->iomap_begin() operation is a hot path, so cache the fs_dax_get_by_host() result to avoid the incurring the hash lookup overhead. Cc: Jan Kara <jack@suse.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- fs/ext2/inode.c | 11 +++++++---- fs/ext4/inode.c | 11 +++++++---- include/linux/fs.h | 3 +++ 3 files changed, 17 insertions(+), 8 deletions(-) diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 30163d007b2f..910e6d502137 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -801,6 +801,7 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length, unsigned flags, struct iomap *iomap) { struct block_device *bdev; + struct dax_device *dax_dev; unsigned int blkbits = inode->i_blkbits; unsigned long first_block = offset >> blkbits; unsigned long max_blocks = (length + (1 << blkbits) - 1) >> blkbits; @@ -817,10 +818,12 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length, bdev = inode->i_sb->s_bdev; iomap->bdev = bdev; iomap->offset = (u64)first_block << blkbits; - if (blk_queue_dax(bdev->bd_queue)) - iomap->dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name); - else - iomap->dax_dev = NULL; + dax_dev = inode->i_sb->s_daxdev; + if (blk_queue_dax(bdev->bd_queue) && !dax_dev) { + dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name); + inode->i_sb->s_daxdev = dax_dev; + } + iomap->dax_dev = dax_dev; if (ret == 0) { iomap->type = IOMAP_HOLE; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c774bdc22759..36a4bcab9e0c 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3405,6 +3405,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, unsigned flags, struct iomap *iomap) { struct block_device *bdev; + struct dax_device *dax_dev; unsigned int blkbits = inode->i_blkbits; unsigned long first_block = offset >> blkbits; unsigned long last_block = (offset + length - 1) >> blkbits; @@ -3475,10 +3476,12 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, iomap->flags = 0; bdev = inode->i_sb->s_bdev; iomap->bdev = bdev; - if (blk_queue_dax(bdev->bd_queue)) - iomap->dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name); - else - iomap->dax_dev = NULL; + dax_dev = inode->i_sb->s_daxdev; + if (blk_queue_dax(bdev->bd_queue) && !dax_dev) { + dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name); + inode->i_sb->s_daxdev = dax_dev; + } + iomap->dax_dev = dax_dev; iomap->offset = first_block << blkbits; if (ret == 0) { diff --git a/include/linux/fs.h b/include/linux/fs.h index 6e1fd5d21248..0594867a172a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1334,6 +1334,9 @@ struct super_block { struct hlist_bl_head s_anon; /* anonymous dentries for (nfs) exporting */ struct list_head s_mounts; /* list of mounts; _not_ for fs use */ struct block_device *s_bdev; +#ifdef CONFIG_FS_DAX + struct dax_device *s_daxdev; /* cached dax_device */ +#endif struct backing_dev_info *s_bdi; struct mtd_info *s_mtd; struct hlist_node s_instances; ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-08-25 7:02 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-25 0:35 [PATCH 0/2] fs: cache dax_device lookup result Dan Williams 2017-08-25 0:35 ` [PATCH 1/2] xfs: " Dan Williams 2017-08-25 5:32 ` Darrick J. Wong 2017-08-25 6:24 ` Dan Williams 2017-08-25 7:02 ` Christoph Hellwig 2017-08-25 7:01 ` Christoph Hellwig 2017-08-25 0:35 ` [PATCH 2/2] ext2, ext4: " Dan Williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).