From: Sidong Yang <realwakka@gmail.com>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
Qu Wenruo <quwenruo.btrfs@gmx.com>,
David Sterba <dsterba@suse.cz>
Subject: Re: [PATCH v2] btrfs-progs: cmds: Add subcommand that dumps file extents
Date: Mon, 12 Jul 2021 06:40:08 +0000 [thread overview]
Message-ID: <20210712064008.GB68357@realwakka> (raw)
In-Reply-To: <20d7b0a8-8e1c-c13a-6a94-525a110a6b0e@suse.com>
On Mon, Jul 12, 2021 at 09:16:17AM +0800, Qu Wenruo wrote:
>
>
> On 2021/7/12 上午12:10, Sidong Yang wrote:
> > This patch adds an subcommand in inspect-internal. It dumps file extents of
> > the file that user provided. It helps to show the internal information
> > about file extents comprise the file.
>
> Despite the comments inlined below for the technical details, I'm not
> determined if we really want the tool.
>
> On one hand, fiemap doesn't provide all detailed btrfs specific info, and
> it's common to utilize the tree search ioctl to do the work, just like
> "compsize".
>
> But on the other hand, I'm not sure if it provides enough info compared to
> things like "btrfs ins dump-tree".
>
> For now I don't have any objection nor preference.
>
> Thus it's again David's call on this.
>
> >
> > Signed-off-by: Sidong Yang <realwakka@gmail.com>
> > ---
> > v2:
> > - Prints type and compression
> > - Use the terms from file_extents_item like disk_bytenr not like physical"
> > ---
> > Makefile | 2 +-
> > cmds/commands.h | 2 +-
> > cmds/inspect-dump-file-extents.c | 165 +++++++++++++++++++++++++++++++
> > cmds/inspect.c | 1 +
> > 4 files changed, 168 insertions(+), 2 deletions(-)
> > create mode 100644 cmds/inspect-dump-file-extents.c
> >
> > diff --git a/Makefile b/Makefile
> > index a1cc457b..911e16de 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -156,7 +156,7 @@ cmds_objects = cmds/subvolume.o cmds/filesystem.o cmds/device.o cmds/scrub.o \
> > cmds/restore.o cmds/rescue.o cmds/rescue-chunk-recover.o \
> > cmds/rescue-super-recover.o \
> > cmds/property.o cmds/filesystem-usage.o cmds/inspect-dump-tree.o \
> > - cmds/inspect-dump-super.o cmds/inspect-tree-stats.o cmds/filesystem-du.o \
> > + cmds/inspect-dump-super.o cmds/inspect-tree-stats.o cmds/inspect-dump-file-extents.o cmds/filesystem-du.o \
> > mkfs/common.o check/mode-common.o check/mode-lowmem.o
> > libbtrfs_objects = common/send-stream.o common/send-utils.o kernel-lib/rbtree.o btrfs-list.o \
> > kernel-lib/radix-tree.o common/extent-cache.o kernel-shared/extent_io.o \
> > diff --git a/cmds/commands.h b/cmds/commands.h
> > index 8fa85d6c..55de248e 100644
> > --- a/cmds/commands.h
> > +++ b/cmds/commands.h
> > @@ -154,5 +154,5 @@ DECLARE_COMMAND(select_super);
> > DECLARE_COMMAND(dump_super);
> > DECLARE_COMMAND(debug_tree);
>
> Off-topic here.
>
> Those "dump_super" and "debug_tree" makes me wonder, do we need to cleanup
> them?
>
> I mean, we have inspect_dump_super for "btrfs ins dump-super", but what's
> "dump_super" here for?
> And what's the "debug_tree" here for?
there is no command dump_super and debug_tree. And these aren't need to
compile.
>
> > DECLARE_COMMAND(rescue);
> > -
> > +DECLARE_COMMAND(inspect_dump_file_extents);
>
> I would be better to put this line where the other "inpsect" subcommands
> are.
Okay, I'll do it.
>
> > #endif
> > diff --git a/cmds/inspect-dump-file-extents.c b/cmds/inspect-dump-file-extents.c
> > new file mode 100644
> > index 00000000..8574a1d0
> > --- /dev/null
> > +++ b/cmds/inspect-dump-file-extents.c
> > @@ -0,0 +1,165 @@
> > +/*
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public
> > + * License v2 as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public
> > + * License along with this program; if not, write to the
> > + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
> > + * Boston, MA 021110-1307, USA.
> > + */
> > +
> > +#include <unistd.h>
> > +#include <stdio.h>
> > +#include <fcntl.h>
> > +
> > +#include <sys/ioctl.h>
> > +
> > +#include "common/utils.h"
> > +#include "cmds/commands.h"
> > +
> > +static const char * const cmd_inspect_dump_file_extents_usage[] = {
> > + "btrfs inspect-internal dump-extent path",
> > + "Dump file extent in a textual form",
> > + NULL
> > +};
> > +
> > +static void compress_type_to_str(u8 compress_type, char *ret)
> > +{
> > + switch (compress_type) {
> > + case BTRFS_COMPRESS_NONE:
> > + strcpy(ret, "none");
> > + break;
> > + case BTRFS_COMPRESS_ZLIB:
> > + strcpy(ret, "zlib");
> > + break;
> > + case BTRFS_COMPRESS_LZO:
> > + strcpy(ret, "lzo");
> > + break;
> > + case BTRFS_COMPRESS_ZSTD:
> > + strcpy(ret, "zstd");
> > + break;
> > + default:
> > + sprintf(ret, "UNKNOWN.%d", compress_type);
> > + }
> > +}
>
> It would be better to just export the function with the same name in
> "kernel-shared/print-tree.c" so we don't have duplicated code.
Yes, I would be better.
>
> > +
> > +static const char* file_extent_type_to_str(u8 type)
> > +{
> > + switch (type) {
> > + case BTRFS_FILE_EXTENT_INLINE: return "inline";
> > + case BTRFS_FILE_EXTENT_PREALLOC: return "prealloc";
> > + case BTRFS_FILE_EXTENT_REG: return "regular";
> > + default: return "unknown";
> > + }
> > +}
>
> The same here.
Okay.
>
> > +
> > +static int cmd_inspect_dump_file_extents(const struct cmd_struct *cmd,
> > + int argc, char **argv)
> > +{
> > + int fd;
> > + struct stat statbuf;
> > + struct btrfs_ioctl_ino_lookup_args lookup;
> > + struct btrfs_ioctl_search_args args;
> > + struct btrfs_ioctl_search_key *sk = &args.key;
> > + struct btrfs_file_extent_item *extent_item;
> > + struct btrfs_ioctl_search_header *header;
> > + u64 pos;
> > + u64 buf_off;
> > + u64 len;
> > + u64 begin;
> > + u64 disk_bytenr;
> > + u64 disk_num_bytes;
> > + u64 offset;
> > + int ret;
> > + int i;
> > + char compress_str[16];
> > +
> > + fd = open(argv[optind], O_RDONLY);
> > + if (fd < 0) {
> > + error("cannot open %s: %m", argv[optind]);
> > + ret = 1;
> > + goto out;
> > + }
> > +
> > + if (fstat(fd, &statbuf) < 0) {
> > + error("failed to fstat %s: %m", argv[optind]);
> > + ret = 1;
> > + goto out;
> > + }
> > +
> > + lookup.treeid = 0;
> > + lookup.objectid = BTRFS_FIRST_FREE_OBJECTID;
> > +
> > + if (ioctl(fd, BTRFS_IOC_INO_LOOKUP, &lookup) < 0) {
> > + error("failed to lookup inode %s: %m", argv[optind]);
> > + ret = 1;
> > + goto out;
> > + }
> > +
> > + pos = 0;
> > +
> > + sk->tree_id = lookup.treeid;
> > + sk->min_objectid = statbuf.st_ino;
> > + sk->max_objectid = statbuf.st_ino;
> > +
> > + sk->max_offset = UINT64_MAX;
> > + sk->min_transid = 0;
> > + sk->max_transid = UINT64_MAX;
> > + sk->min_type = sk->max_type = BTRFS_EXTENT_DATA_KEY;
> > + sk->nr_items = 4096;
>
> You may want to do the tree search ioctl in a loop, as it's pretty common
> for super large or heavily fragmented inode to have way more items than one
> ioctl can return.
I don't think much about this. I wonder if it's proper way to search
tree. is there any better way than this code?
>
> > +
> > + while (statbuf.st_size > pos) {
> > + sk->min_offset = pos;
> > + if (ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args)) {
> > + error("failed to search tree ioctl %s: %m", argv[optind]);
> > + ret = 1;
> > + goto out;
> > + }
> > +
> > + buf_off = 0;
> > + for(i=0; i<sk->nr_items; ++i) {
> > + header = (struct btrfs_ioctl_search_header *)(args.buf + buf_off);
> > +
> > + if (btrfs_search_header_type(header) == BTRFS_EXTENT_DATA_KEY) {
> > + extent_item = (struct btrfs_file_extent_item *)(header + 1);
> > + begin = btrfs_search_header_offset(header);
> > +
> > + printf("type = %s, begin = %llu, ",
> > + file_extent_type_to_str(extent_item->type), begin);
> > + switch (extent_item->type) {
> > + case BTRFS_FILE_EXTENT_INLINE:
> > + len = le64_to_cpu(extent_item->ram_bytes);
> > + printf("end = %llu\n", begin + len);
> > + break;
> > + case BTRFS_FILE_EXTENT_REG:
> > + case BTRFS_FILE_EXTENT_PREALLOC:
> > + len = le64_to_cpu(extent_item->num_bytes);
> > + disk_bytenr = le64_to_cpu(extent_item->disk_bytenr);
> > + disk_num_bytes = le64_to_cpu(extent_item->disk_num_bytes);
> > + offset = le64_to_cpu(extent_item->offset);
> > + compress_type_to_str(extent_item->compression, compress_str);
> > + printf("end = %llu, disk_bytenr = %llu, disk_num_bytes = %llu,"
>
> For "end" we normally mean inclusive end.
> E.g, for @start = 1M, @len = 4K, then the @end should be 1M + 4K - 1.
>
> Thus it would be better to just output the length, not the end.
Okay, Printing output as start and len would be more explicit.
Thanks,
Sidong
>
> (I know this sounds a little nitpicking, but trust me, when you have seen
> too many bugs caused by such offset-by-one behavior, you will be as
> sensitive as me on this)
>
> Thanks,
> Qu
>
> > + " offset = %llu, compression = %s\n",
> > + begin + len, disk_bytenr, disk_num_bytes, offset, compress_str); > +
> > + break;
> > + }
> > +
> > + }
> > + buf_off += sizeof(*header) + btrfs_search_header_len(header);
> > + pos += len;
> > + }
> > +
> > + }
> > + ret = 0;
> > +out:
> > + close(fd);
> > + return ret;
> > +}
> > +DEFINE_SIMPLE_COMMAND(inspect_dump_file_extents, "dump-file-extents");
> > diff --git a/cmds/inspect.c b/cmds/inspect.c
> > index 2ef5f4b6..dfb0e27b 100644
> > --- a/cmds/inspect.c
> > +++ b/cmds/inspect.c
> > @@ -696,6 +696,7 @@ static const struct cmd_group inspect_cmd_group = {
> > &cmd_struct_inspect_dump_tree,
> > &cmd_struct_inspect_dump_super,
> > &cmd_struct_inspect_tree_stats,
> > + &cmd_struct_inspect_dump_file_extents,
> > NULL
> > }
> > };
> >
>
next prev parent reply other threads:[~2021-07-12 6:52 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-11 16:10 [PATCH v2] btrfs-progs: cmds: Add subcommand that dumps file extents Sidong Yang
2021-07-12 1:16 ` Qu Wenruo
2021-07-12 6:40 ` Sidong Yang [this message]
2021-07-12 6:46 ` Qu Wenruo
2021-07-13 16:45 ` Sidong Yang
2021-07-12 6:52 ` Johannes Thumshirn
2021-07-13 16:54 ` Sidong Yang
2021-07-13 22:16 ` Qu Wenruo
2021-07-14 6:41 ` Sidong Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210712064008.GB68357@realwakka \
--to=realwakka@gmail.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox