linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: David Sterba <dsterba@suse.com>
Cc: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.cz>
Subject: Re: [RFC][PATCH] btrfs-progs: inspect: new subcommand to dump chunks
Date: Wed, 22 Jun 2016 18:53:52 -0700	[thread overview]
Message-ID: <20160623015352.GA15815@localhost.localdomain> (raw)
In-Reply-To: <1466616406-28087-1-git-send-email-dsterba@suse.com>

On Wed, Jun 22, 2016 at 07:26:46PM +0200, David Sterba wrote:
> From: David Sterba <dsterba@suse.cz>
> 
> Hi,
> 
> the chunk dump is a useful thing, for debugging or balance filters.
> 
> Example output:
> 
> Chunks on device id: 1
> PNumber            Type        PStart        Length          PEnd     Age         LStart  Usage
> -----------------------------------------------------------------------------------------------
>       0   System/RAID1        1.00MiB      32.00MiB      33.00MiB      47        1.40TiB   0.06
>       1 Metadata/RAID1       33.00MiB       1.00GiB       1.03GiB      31        1.36TiB  64.03
>       2 Metadata/RAID1        1.03GiB      32.00MiB       1.06GiB      36        1.36TiB  77.28
>       3     Data/single       1.06GiB       1.00GiB       2.06GiB      12      422.30GiB  78.90
>       4     Data/single       2.06GiB       1.00GiB       3.06GiB      11      420.30GiB  78.47
> ...
> 
> (full output is at http://susepaste.org/view/raw/089a9877)
> 
> This patch does the basic output, not filtering or sorting besides physical and
> logical offset. There are possiblities to do more or enhance the output (eg.
> starting with logical chunk and list the related physcial chunks together).
> Or filter by type/profile, or understand the balance filters format.
> 
> As it is now, it's per-device dump of physical layout, which was the original
> idea.
> 
> Printing 'usage' is not default as it's quite slow, it uses the search ioctl
> and probably not in the best way, or there's some other issue in the
> implementation.
> 
> I'll add the patch to devel branch but will not add it for any particular
> release yet, I'd like some feedback first. Thanks.
> 
> -------------
> New command 'btrfs inspect-internal dump-chunks' will dump layout of
> chunks as stored on the devices. This corresponds to the physical
> layout, sorted by the physical offset. The block group usage can be
> shown as well, but the search is too slow so it's off by default.
> 
> If the physical offset sorting is selected, the empty space between
> chunks is also shown.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  cmds-inspect.c | 364 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 364 insertions(+)
> 
> diff --git a/cmds-inspect.c b/cmds-inspect.c
> index dd7b9dd278f2..4eace63d8517 100644
> --- a/cmds-inspect.c
> +++ b/cmds-inspect.c
> @@ -623,6 +623,368 @@ static int cmd_inspect_min_dev_size(int argc, char **argv)
>  	return !!ret;
>  }
>  
> +static const char * const cmd_dump_chunks_usage[] = {
> +	"btrfs inspect-internal chunk-stats [options] <path>",
> +	"Show chunks (block groups) layout",
> +	"Show chunks (block groups) layout for all devices",
> +	"",
> +	HELPINFO_UNITS_LONG,
> +	"--sort=MODE        sort by the physical or logical chunk start",
> +	"                   MODE is one of pstart or lstart (default: pstart)",
> +	"--usage            show usage per block group, note this can be slow",
> +	NULL
> +};
> +
> +enum {
> +	CHUNK_SORT_PSTART,
> +	CHUNK_SORT_LSTART,
> +	CHUNK_SORT_DEFAULT = CHUNK_SORT_PSTART
> +};
> +
> +struct dump_chunks_entry {
> +	u64 devid;
> +	u64 start;
> +	u64 lstart;
> +	u64 length;
> +	u64 flags;
> +	u64 age;
> +	u64 used;
> +	u32 pnumber;
> +};
> +
> +struct dump_chunks_ctx {
> +	unsigned length;
> +	unsigned size;
> +	struct dump_chunks_entry *stats;
> +};
> +
> +int cmp_cse_devid_start(const void *va, const void *vb)
> +{
> +	const struct dump_chunks_entry *a = va;
> +	const struct dump_chunks_entry *b = vb;
> +
> +	if (a->devid < b->devid)
> +		return -1;
> +	if (a->devid > b->devid)
> +		return 1;
> +
> +	if (a->start < b->start)
> +		return -1;
> +	if (a->start == b->start) {
> +		error(
> +	"chunks start on same offset in the same device: devid %llu start %llu",
> +		    (unsigned long long)a->devid, (unsigned long long)a->start);
> +		return 0;
> +	}
> +	return 1;
> +}
> +
> +int cmp_cse_devid_lstart(const void *va, const void *vb)
> +{
> +	const struct dump_chunks_entry *a = va;
> +	const struct dump_chunks_entry *b = vb;
> +
> +	if (a->devid < b->devid)
> +		return -1;
> +	if (a->devid > b->devid)
> +		return 1;
> +
> +	if (a->lstart < b->lstart)
> +		return -1;
> +	if (a->lstart == b->lstart) {
> +		error(
> +"chunks logically start on same offset in the same device: devid %llu start %llu",
> +		    (unsigned long long)a->devid, (unsigned long long)a->lstart);
> +		return 0;
> +	}
> +	return 1;
> +}
> +
> +void print_dump_chunks(struct dump_chunks_ctx *ctx, unsigned sort_mode,
> +		unsigned unit_mode, int with_usage)
> +{
> +	u64 devid;
> +	struct dump_chunks_entry e;
> +	int i;
> +	int chidx;
> +	u64 lastend = 0;
> +	u64 age;
> +
> +	/*
> +	 * Chunks are sorted logically as found by the ioctl, we need to sort
> +	 * them once to find the physical ordering. This is the default mode.
> +	 */
> +	qsort(ctx->stats, ctx->length, sizeof(ctx->stats[0]), cmp_cse_devid_start);
> +	devid = 0;
> +	age = 0;
> +	for (i = 0; i < ctx->length; i++) {
> +		e = ctx->stats[i];
> +		if (e.devid != devid) {
> +			devid = e.devid;
> +			age = 0;
> +		}
> +		ctx->stats[i].pnumber = age;
> +		age++;
> +	}
> +
> +	if (sort_mode == CHUNK_SORT_LSTART)
> +		qsort(ctx->stats, ctx->length, sizeof(ctx->stats[0]), cmp_cse_devid_lstart);
> +
> +	devid = 0;
> +	for (i = 0; i < ctx->length; i++) {
> +		e = ctx->stats[i];
> +		if (e.devid != devid) {
> +			devid = e.devid;
> +			if (i != 0)
> +				putchar('\n');
> +			printf("Chunks on device id: %llu\n", devid);
> +			printf("PNumber            Type        PStart        Length          PEnd     Age         LStart%s\n",
> +					with_usage ? "  Usage" : "");
> +			printf("----------------------------------------------------------------------------------------%s\n",
> +					with_usage ? "-------" : "");
> +			chidx = 0;
> +			lastend = 0;
> +		}
> +		if (sort_mode == CHUNK_SORT_PSTART && lastend > 0
> +		    && e.start != lastend) {
> +			printf("      .           empty             .  ");
> +			printf("%12s  ",
> +				pretty_size_mode(e.start - lastend, unit_mode));
> +			printf("           .       .              .\n");
> +		}
> +
> +		printf("%7u ", e.pnumber);
> +		printf("%8s/%-6s  ", btrfs_group_type_str(e.flags),
> +				btrfs_group_profile_str(e.flags));
> +		printf("%12s  ", pretty_size_mode(e.start, unit_mode));
> +		printf("%12s  ", pretty_size_mode(e.length, unit_mode));
> +		printf("%12s  ",
> +			pretty_size_mode(e.start + e.length - 1, unit_mode));
> +		printf("%6llu ", e.age);
> +		printf("%14s", pretty_size_mode(e.lstart, unit_mode));
> +		if (with_usage)
> +			printf("  %5.2f", (float)e.used / e.length * 100);
> +		printf("\n");
> +
> +		lastend = e.start + e.length;
> +		chidx++;
> +	}
> +}
> +
> +static u64 fill_usage(int fd, u64 lstart)
> +{
> +	struct btrfs_ioctl_search_args args;
> +	struct btrfs_ioctl_search_key *sk = &args.key;
> +	struct btrfs_ioctl_search_header sh;
> +	struct btrfs_block_group_item *item;
> +	int ret;
> +
> +	memset(&args, 0, sizeof(args));
> +	sk->tree_id = BTRFS_EXTENT_TREE_OBJECTID;
> +	sk->min_objectid = lstart;
> +	sk->max_objectid = lstart;
> +	sk->min_type = BTRFS_BLOCK_GROUP_ITEM_KEY;
> +	sk->max_type = BTRFS_BLOCK_GROUP_ITEM_KEY;
> +	sk->min_offset = 0;
> +	sk->max_offset = (u64)-1;
> +	sk->max_transid = (u64)-1;
> +
> +	sk->nr_items = 4096;

What if we set nr_items = 1?  From the code review, this can let us
stop and return immediately.

Thanks,

-liubo
> +	ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args);
> +	if (ret < 0) {
> +		error("cannot perform the search: %s", strerror(errno));
> +		return 1;
> +	}
> +	if (sk->nr_items == 0) {
> +		warning("blockgroup %llu not found",
> +				(unsigned long long)lstart);
> +		return 0;
> +	}
> +	if (sk->nr_items > 1) {
> +		warning("found more than one blockgroup %llu",
> +				(unsigned long long)lstart);
> +	}
> +
> +	memcpy(&sh, args.buf, sizeof(sh));
> +	item = (struct btrfs_block_group_item*)(args.buf + sizeof(sh));
> +
> +	return item->used;
> +}
> +
> +static int cmd_dump_chunks(int argc, char **argv)
> +{
> +	struct btrfs_ioctl_search_args args;
> +	struct btrfs_ioctl_search_key *sk = &args.key;
> +	struct btrfs_ioctl_search_header sh;
> +	unsigned long off = 0;
> +	u64 *age = 0;
> +	unsigned age_size = 128;
> +	int ret;
> +	int fd;
> +	int i;
> +	int e;
> +	DIR *dirstream = NULL;
> +	unsigned unit_mode;
> +	unsigned sort_mode = 0;
> +	int with_usage = 0;
> +	const char *path;
> +	struct dump_chunks_ctx ctx = {
> +		.length = 0,
> +		.size = 1024,
> +		.stats = NULL
> +	};
> +
> +	unit_mode = get_unit_mode_from_arg(&argc, argv, 0);
> +
> +	while (1) {
> +		int c;
> +		enum { GETOPT_VAL_SORT = 256, GETOPT_VAL_USAGE };
> +		static const struct option long_options[] = {
> +			{"sort", required_argument, NULL, GETOPT_VAL_SORT },
> +			{"usage", no_argument, NULL, GETOPT_VAL_USAGE },
> +			{NULL, 0, NULL, 0}
> +		};
> +
> +		c = getopt_long(argc, argv, "", long_options, NULL);
> +		if (c < 0)
> +			break;
> +
> +		switch (c) {
> +		case GETOPT_VAL_SORT:
> +			if (strcmp(optarg, "pstart") == 0) {
> +				sort_mode = CHUNK_SORT_PSTART;
> +			} else if (strcmp(optarg, "lstart") == 0) {
> +				sort_mode = CHUNK_SORT_LSTART;
> +			} else {
> +				error("unknown sort mode: %s", optarg);
> +				exit(1);
> +			}
> +			break;
> +		case GETOPT_VAL_USAGE:
> +			with_usage = 1;
> +			break;
> +		default:
> +			usage(cmd_dump_chunks_usage);
> +		}
> +	}
> +
> +	if (check_argc_exact(argc - optind, 1))
> +		usage(cmd_dump_chunks_usage);
> +
> +	ctx.stats = calloc(ctx.size, sizeof(ctx.stats[0]));
> +	if (!ctx.stats)
> +		goto out_nomem;
> +
> +	path = argv[optind];
> +
> +	fd = open_file_or_dir(path, &dirstream);
> +	if (fd < 0) {
> +	        error("cannot access '%s': %s", path, strerror(errno));
> +		return 1;
> +	}
> +
> +	memset(&args, 0, sizeof(args));
> +	sk->tree_id = BTRFS_CHUNK_TREE_OBJECTID;
> +	sk->min_objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
> +	sk->max_objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
> +	sk->min_type = BTRFS_CHUNK_ITEM_KEY;
> +	sk->max_type = BTRFS_CHUNK_ITEM_KEY;
> +	sk->max_offset = (u64)-1;
> +	sk->max_transid = (u64)-1;
> +	age = calloc(age_size, sizeof(u64));
> +	if (!age)
> +		goto out_nomem;
> +
> +	while (1) {
> +		sk->nr_items = 4096;
> +		ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args);
> +		e = errno;
> +		if (ret < 0) {
> +			error("cannot perform the search: %s", strerror(e));
> +			return 1;
> +		}
> +		if (sk->nr_items == 0)
> +			break;
> +
> +		off = 0;
> +		for (i = 0; i < sk->nr_items; i++) {
> +			struct btrfs_chunk *item;
> +			struct btrfs_stripe *stripes;
> +			int sidx;
> +			u64 used = (u64)-1;
> +
> +			memcpy(&sh, args.buf + off, sizeof(sh));
> +			off += sizeof(sh);
> +			item = (struct btrfs_chunk*)(args.buf + off);
> +			off += sh.len;
> +
> +			stripes = &item->stripe;
> +			for (sidx = 0; sidx < item->num_stripes; sidx++) {
> +				struct dump_chunks_entry *e;
> +				u64 devid;
> +
> +				e = &ctx.stats[ctx.length];
> +				devid = stripes[sidx].devid;
> +				e->devid = devid;
> +				e->start = stripes[sidx].offset;
> +				e->lstart = sh.offset;
> +				e->length = item->length;
> +				e->flags = item->type;
> +				e->pnumber = -1;
> +				while (devid > age_size) {
> +					u64 *tmp;
> +					unsigned old_size = age_size;
> +
> +					age_size += 128;
> +					tmp = calloc(age_size, sizeof(u64));
> +					if (!tmp) {
> +						free(age);
> +						goto out_nomem;
> +					}
> +					memcpy(tmp, age, sizeof(u64) * old_size);
> +					age = tmp;
> +				}
> +				e->age = age[devid]++;
> +				if (with_usage) {
> +					if (used == (u64)-1)
> +						used = fill_usage(fd, sh.offset);
> +					e->used = used;
> +				} else {
> +					e->used = 0;
> +				}
> +
> +				ctx.length++;
> +
> +				if (ctx.length == ctx.size) {
> +					ctx.size += 1024;
> +					ctx.stats = realloc(ctx.stats, ctx.size
> +						* sizeof(ctx.stats[0]));
> +					if (!ctx.stats)
> +						goto out_nomem;
> +				}
> +			}
> +
> +			sk->min_objectid = sh.objectid;
> +			sk->min_type = sh.type;
> +			sk->min_offset = sh.offset;
> +		}
> +		if (sk->min_offset < (u64)-1)
> +			sk->min_offset++;
> +		else
> +			break;
> +	}
> +
> +	print_dump_chunks(&ctx, sort_mode, unit_mode, with_usage);
> +	free(ctx.stats);
> +
> +	close_file_or_dir(fd, dirstream);
> +	return 0;
> +
> +out_nomem:
> +	error("not enough memory");
> +	return 1;
> +}
> +
>  static const char inspect_cmd_group_info[] =
>  "query various internal information";
>  
> @@ -644,6 +1006,8 @@ const struct cmd_group inspect_cmd_group = {
>  				cmd_inspect_dump_super_usage, NULL, 0 },
>  		{ "tree-stats", cmd_inspect_tree_stats,
>  				cmd_inspect_tree_stats_usage, NULL, 0 },
> +		{ "dump-chunks", cmd_dump_chunks, cmd_dump_chunks_usage, NULL,
> +			0 },
>  		NULL_CMD_STRUCT
>  	}
>  };
> -- 
> 2.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2016-06-23  1:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-22 17:26 [RFC][PATCH] btrfs-progs: inspect: new subcommand to dump chunks David Sterba
2016-06-22 22:20 ` Hans van Kranenburg
2016-06-23 13:13   ` David Sterba
2016-06-23 13:17     ` Hans van Kranenburg
2016-06-23  1:10 ` Hans van Kranenburg
2016-06-23 13:27   ` David Sterba
2016-06-23  1:20 ` Qu Wenruo
2016-06-23 13:07   ` David Sterba
2016-06-23  1:53 ` Liu Bo [this message]
2016-06-23 12:43   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160623015352.GA15815@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=dsterba@suse.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).