From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 5/8] xfs: introduce the XFS_IOC_GETFSMAP ioctl
Date: Thu, 23 Feb 2017 16:54:18 -0800 [thread overview]
Message-ID: <20170224005418.GJ5846@birch.djwong.org> (raw)
In-Reply-To: <20170223234301.GA57524@bfoster.bfoster>
On Thu, Feb 23, 2017 at 06:43:01PM -0500, Brian Foster wrote:
> On Thu, Feb 23, 2017 at 12:44:05PM -0800, Darrick J. Wong wrote:
> > On Thu, Feb 23, 2017 at 09:45:43AM -0500, Brian Foster wrote:
> > > On Wed, Feb 22, 2017 at 01:17:57PM -0800, Darrick J. Wong wrote:
> > > > On Wed, Feb 22, 2017 at 10:02:47AM -0500, Brian Foster wrote:
> > > > > On Fri, Feb 17, 2017 at 05:17:49PM -0800, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > >
> > > > > > Introduce a new ioctl that uses the reverse mapping btree to return
> > > > > > information about the physical layout of the filesystem.
> > > > > >
> > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > ---
> > > > >
> ...
> > > So the rmapbt query doesn't incorporate the full getfsmap key in the
> > > search and we thus we have to include finer grained filtering..? If so,
> > > I think this bit could be noted explicitly in the comment..
> >
> > It's a funny quirk of how queries have to work when record keys contain
> > multiple overlapping ranges interacting with the current design of
> > query_range.
> >
> > rmap tuple format is still (start, length, owner, offset).
> >
> > Say you have the rmaps (24, 8, 128, 0) and (30, 10, 128, 8). The low
> > key of the first rmap is (24, 128, 0) and the high key is (31, 128, 7).
> > That way you can query the rmapbt for start == 27 and it'll return the
> > above rmap. _btree_query_range was designed to return any record
> > overlapping with any part of the interval, even if the record start or
> > record end themselves are not within the interval.
> >
> > However, if you want to look for the next tuple after (24, 8, 128, 0)
> > you can't just tell it to search for (31...) because 31 > 30 and it'll
> > miss that second rmap. You can however tell it to search (24, 128, 8)
> > and ignore any records if any part of the low key is not greater than
> > the low search key.
> >
> > Theoretically, we could enhance query_range to take an operator so that
> > you could tell it to return any record overlapping with any part of the
> > interval so long as the start of the record is strictly greater than the
> > start of the query interval.
> >
> > FWIW the comment for _btree_query_range says it returns all records
> > overlapping with the interval passed in.
> >
>
> Gotcha, thanks.
>
> > > Also, I'm kind of wondering why we couldn't have just set next_daddr to
> > > 40 in the first place based on the low key. Is there some other corner
> > > case that breaks..?
> >
> > Aha, looking through my notes, the original version also used next_daddr
> > (buggily) to decide if a record actually started before the bumped low
> > key so that it could ignore it. Subsequent revisions created an
> > explicit test function (_getfsmap_rec_before_low_key) so you're right,
> > we can set next_daddr to 40 (fmh_keys[0]->fmr_physical + fmr_length) for
> > the first loop iteration and zero for subsequent iterations. With that
> > the key_end business also goes away.
> >
>
> Oh, Ok. That sounds like a nice cleanup then.
>
> > > > Assuming _getfsmap_helper is passed the same refcount rmap as before.
> > > > rec_daddr = 64, and when we get to line 318, key_end = 24 + 16.
> > > >
> > > > The test is now:
> > > > if (devices match && 40 < 40 && 64 >= 40)
> > > > So we leave next_daddr at 40.
> > > >
> > > > The correct output here is to synthesize an fsmap record for free space
> > > > between 40-64, and then to emit the refcount record at 64.
> > > >
> > > > Third example: fmh_keys = [(8:0, 24, 16, 128, 0), (-1)] and next_daddr = 24
> > > > as before. _getfsmap_helper again sees (8:0, 24, 16, 128, 0) and sets
> > > > next_daddr = 40.
> > > >
> > > > This time, however, _getfsmap_helper is passed (8:0, 32, 8, 129, 0),
> > > > which is 8 sectors of inode 129's data at offset 0. rec_daddr = 32,
> > > > key_end = 24 + 16 as before.
> > > >
> > > > The test is now:
> > > > if (devices match && 40 < 40 && 32 >= 40)
> > > > So we again leave next_daddr at 40, then emit the fsmap for inode 129.
> > > >
> ...
> > > > >
> > > > > Hmm.. could we combine these into one call that looks like:
> > > > >
> > > > > trace_xfs_getfsmap(mp, &fmh_keys[0], &fmh_keys[1]);
> > > > >
> > > > > ... and has the trace handler pull the relevant data out of the key
> > > > > structure (same comment for the similar trace_xfs_fsmap*())?
> > > >
> > > > I'd prefer to leave it as-is, because passing struct xfs_fsmap into a
> > > > tracepoint requires every file that includes xfs_trace.h to also include
> > > > xfs_fsmap.h to get the structure definition.
> > > >
> > >
> > > I think you can get around that with a structure declaration in
> > > xfs_trace.h, as the only code that actually requires the full definition
> > > is xfs_trace.c. If that works, that could at least reduce the tracepoint
> > > calls to a couple lines of code, even if we retain the independent
> > > low/high key tp's.
> >
> > But then we'd have two definitions of the same structure, and anyone
> > touching xfs_fsmap would have to remember to update both. I think it's
> > fine to pass pointers to core libxfs/*format.h structures directly into
> > tracepoints, but fsmap is on the periphery.
> >
>
> No, it's just a declaration (not the full structure definition). I.e.,
> add 'struct xfs_fsmap;' to the list of similarly declared structures at
> the top of xfs_trace.h.
>
> See the appended diff for what I mean. It only changes one of the
> classes, but compiles clean for me (compile tested only)..
Oh, you're right, that does work. I forgot that xfs_trace.h is a magic
headerfile on account of CREATE_TRACE_POINTS. Ok, I'll go fix that too.
--D
>
> Brian
>
> --- 8< ---
>
> diff --git a/fs/xfs/xfs_fsmap.h b/fs/xfs/xfs_fsmap.h
> index 1943047..55f7c85 100644
> --- a/fs/xfs/xfs_fsmap.h
> +++ b/fs/xfs/xfs_fsmap.h
> @@ -20,6 +20,8 @@
> #ifndef __XFS_FSMAP_H__
> #define __XFS_FSMAP_H__
>
> +struct fsmap;
> +
> /* internal fsmap representation */
> struct xfs_fsmap {
> dev_t fmr_device; /* device id */
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index e1f3fbf..95f4923 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1622,9 +1622,7 @@ xfs_getfsmap_format(struct xfs_fsmap *xfm, void *priv)
> struct getfsmap_info *info = priv;
> struct fsmap fm;
>
> - trace_xfs_getfsmap_mapping(info->mp, xfm->fmr_device, xfm->fmr_physical,
> - xfm->fmr_length, xfm->fmr_owner, xfm->fmr_offset,
> - xfm->fmr_flags);
> + trace_xfs_getfsmap_mapping(info->mp, xfm);
>
> info->last_flags = xfm->fmr_flags;
> xfs_fsmap_from_internal(&fm, xfm);
> @@ -1664,21 +1662,8 @@ xfs_ioc_getfsmap(
> xfs_fsmap_to_internal(&xhead.fmh_keys[0], &head.fmh_keys[0]);
> xfs_fsmap_to_internal(&xhead.fmh_keys[1], &head.fmh_keys[1]);
>
> - trace_xfs_getfsmap_low_key(ip->i_mount,
> - xhead.fmh_keys[0].fmr_device,
> - xhead.fmh_keys[0].fmr_physical,
> - xhead.fmh_keys[0].fmr_length,
> - xhead.fmh_keys[0].fmr_owner,
> - xhead.fmh_keys[0].fmr_offset,
> - xhead.fmh_keys[0].fmr_flags);
> -
> - trace_xfs_getfsmap_high_key(ip->i_mount,
> - xhead.fmh_keys[1].fmr_device,
> - xhead.fmh_keys[1].fmr_physical,
> - xhead.fmh_keys[1].fmr_length,
> - xhead.fmh_keys[1].fmr_owner,
> - xhead.fmh_keys[1].fmr_offset,
> - xhead.fmh_keys[1].fmr_flags);
> + trace_xfs_getfsmap_low_key(ip->i_mount, &xhead.fmh_keys[0]);
> + trace_xfs_getfsmap_high_key(ip->i_mount, &xhead.fmh_keys[1]);
>
> info.mp = ip->i_mount;
> info.data = ((__force struct fsmap_head *)arg)->fmh_recs;
> diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
> index 7f17ae6..5d95fe3 100644
> --- a/fs/xfs/xfs_trace.c
> +++ b/fs/xfs/xfs_trace.c
> @@ -47,6 +47,7 @@
> #include "xfs_inode_item.h"
> #include "xfs_bmap_btree.h"
> #include "xfs_filestream.h"
> +#include "xfs_fsmap.h"
>
> /*
> * We include this last to have the helpers above available for the trace
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index dbfc4db..a8eaf34 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -40,6 +40,7 @@ struct xfs_inode_log_format;
> struct xfs_bmbt_irec;
> struct xfs_btree_cur;
> struct xfs_refcount_irec;
> +struct xfs_fsmap;
>
> DECLARE_EVENT_CLASS(xfs_attr_list_class,
> TP_PROTO(struct xfs_attr_list_context *ctx),
> @@ -3311,10 +3312,8 @@ DEFINE_FSMAP_EVENT(xfs_fsmap_high_key);
> DEFINE_FSMAP_EVENT(xfs_fsmap_mapping);
>
> DECLARE_EVENT_CLASS(xfs_getfsmap_class,
> - TP_PROTO(struct xfs_mount *mp, u32 keydev, xfs_daddr_t block,
> - xfs_daddr_t len, __uint64_t owner, __uint64_t offset,
> - __uint64_t flags),
> - TP_ARGS(mp, keydev, block, len, owner, offset, flags),
> + TP_PROTO(struct xfs_mount *mp, struct xfs_fsmap *fsmap),
> + TP_ARGS(mp, fsmap),
> TP_STRUCT__entry(
> __field(dev_t, dev)
> __field(dev_t, keydev)
> @@ -3326,12 +3325,12 @@ DECLARE_EVENT_CLASS(xfs_getfsmap_class,
> ),
> TP_fast_assign(
> __entry->dev = mp->m_super->s_dev;
> - __entry->keydev = new_decode_dev(keydev);
> - __entry->block = block;
> - __entry->len = len;
> - __entry->owner = owner;
> - __entry->offset = offset;
> - __entry->flags = flags;
> + __entry->keydev = new_decode_dev(fsmap->fmr_device);
> + __entry->block = fsmap->fmr_physical;
> + __entry->len = fsmap->fmr_length;
> + __entry->owner = fsmap->fmr_owner;
> + __entry->offset = fsmap->fmr_offset;
> + __entry->flags = fsmap->fmr_flags;
> ),
> TP_printk("dev %d:%d keydev %d:%d block %llu len %llu owner %lld offset %llu flags 0x%llx\n",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> @@ -3344,10 +3343,8 @@ DECLARE_EVENT_CLASS(xfs_getfsmap_class,
> )
> #define DEFINE_GETFSMAP_EVENT(name) \
> DEFINE_EVENT(xfs_getfsmap_class, name, \
> - TP_PROTO(struct xfs_mount *mp, u32 keydev, xfs_daddr_t block, \
> - xfs_daddr_t len, __uint64_t owner, __uint64_t offset, \
> - __uint64_t flags), \
> - TP_ARGS(mp, keydev, block, len, owner, offset, flags))
> + TP_PROTO(struct xfs_mount *mp, struct xfs_fsmap *fsmap), \
> + TP_ARGS(mp, fsmap))
> DEFINE_GETFSMAP_EVENT(xfs_getfsmap_low_key);
> DEFINE_GETFSMAP_EVENT(xfs_getfsmap_high_key);
> DEFINE_GETFSMAP_EVENT(xfs_getfsmap_mapping);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-02-24 0:55 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-18 1:17 [RFC PATCH v6 0/8] vfs/xfs/ext4: GETFSMAP support Darrick J. Wong
2017-02-18 1:17 ` [PATCH 1/8] vfs: add common GETFSMAP ioctl definitions Darrick J. Wong
2017-02-18 1:17 ` [PATCH 2/8] xfs: plumb in needed functions for range querying of the freespace btrees Darrick J. Wong
2017-02-21 14:35 ` Brian Foster
2017-02-21 17:22 ` Darrick J. Wong
2017-02-21 17:34 ` [PATCH v2 " Darrick J. Wong
2017-02-22 15:02 ` Brian Foster
2017-02-18 1:17 ` [PATCH 3/8] xfs: provide a query_range function for " Darrick J. Wong
2017-02-21 14:35 ` Brian Foster
2017-02-18 1:17 ` [PATCH 4/8] xfs: create a function to query all records in a btree Darrick J. Wong
2017-02-21 14:35 ` Brian Foster
2017-02-18 1:17 ` [PATCH 5/8] xfs: introduce the XFS_IOC_GETFSMAP ioctl Darrick J. Wong
2017-02-22 15:02 ` Brian Foster
2017-02-22 21:17 ` Darrick J. Wong
2017-02-23 14:45 ` Brian Foster
2017-02-23 20:44 ` Darrick J. Wong
2017-02-23 23:43 ` Brian Foster
2017-02-24 0:54 ` Darrick J. Wong [this message]
2017-02-18 1:17 ` [PATCH 6/8] xfs: have getfsmap fall back to the freesp btrees when rmap is not present Darrick J. Wong
2017-02-24 13:04 ` Brian Foster
2017-02-24 17:48 ` Darrick J. Wong
2017-02-24 22:33 ` Darrick J. Wong
2017-02-18 1:18 ` [PATCH 7/8] xfs: getfsmap should fall back to rtbitmap when rtrmapbt " Darrick J. Wong
2017-02-18 1:18 ` [PATCH 8/8] ext4: support GETFSMAP ioctls Darrick J. Wong
2017-02-21 22:14 ` [PATCH] ioctl_getfsmap.2: document the GETFSMAP ioctl Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170224005418.GJ5846@birch.djwong.org \
--to=darrick.wong@oracle.com \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).