linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Boris Burkov <boris@bur.io>
Cc: Jens Axboe <axboe@kernel.dk>,
	cgroups@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH blk-cgroup/for-5.8] blk-cgroup: show global disk stats in root cgroup io.stat
Date: Mon, 1 Jun 2020 11:43:51 -0400	[thread overview]
Message-ID: <20200601154351.GD31548@mtj.thefacebook.com> (raw)
In-Reply-To: <20200529232017.1795920-1-boris@bur.io>

Hello, Boris.

On Fri, May 29, 2020 at 04:20:17PM -0700, Boris Burkov wrote:
> In order to improve consistency and usability in cgroup stat accounting,
> we would like to support the root cgroup's io.stat.
> 
> Since the root cgroup has processes doing io even if the system has no
> explicitly created cgroups, we need to be careful to avoid overhead in
> that case.  For that reason, the rstat algorithms don't handle the root
> cgroup, so just turning the file on wouldn't give correct statistics.
> 
> To get around this, we simulate flushing the iostat struct by filling it
> out directly from global disk stats. The result is a root cgroup io.stat
> file consistent with both /proc/diskstats and io.stat.
> 
> Signed-off-by: Boris Burkov <boris@bur.io>
> Suggested-by: Tejun Heo <tj@kernel.org>
...
> +static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src)
> +{

Can you please separate out code reorganization to a separate patch so that
the actual change can be reviewed clearly?

> +/*
> + * The rstat algorithms intentionally don't handle the root cgroup to avoid
> + * incurring overhead when no cgroups are defined. For that reason,
> + * cgroup_rstat_flush in blkcg_print_stat does not actually fill out the
> + * iostat in the root cgroup's blkcg_gq.
> + *
> + * However, we would like to re-use the printing code between the root and
> + * non-root cgroups to the extent possible. For that reason, we simulate
> + * flushing the root cgroup's stats by explicitly filling in the iostat
> + * with disk level statistics.
> + */

This is clever and neat.

> +static void blkcg_fill_root_iostats(void)
> +{
> +	struct class_dev_iter iter;
> +	struct device *dev;
> +
> +	class_dev_iter_init(&iter, &block_class, NULL, &disk_type);
> +	while ((dev = class_dev_iter_next(&iter))) {
> +		struct gendisk *disk = dev_to_disk(dev);
> +		struct hd_struct *part = disk_get_part(disk, 0);
> +		struct blkcg_gq *blkg = blk_queue_root_blkg(disk->queue);
> +		struct blkg_iostat tmp;
> +		int cpu;
> +
> +		memset(&tmp, 0, sizeof(tmp));
> +		for_each_possible_cpu(cpu) {
> +			struct disk_stats *cpu_dkstats;
> +
> +			cpu_dkstats = per_cpu_ptr(part->dkstats, cpu);
> +			tmp.ios[BLKG_IOSTAT_READ] +=
> +				cpu_dkstats->ios[STAT_READ];
> +			tmp.ios[BLKG_IOSTAT_WRITE] +=
> +				cpu_dkstats->ios[STAT_WRITE];
> +			tmp.ios[BLKG_IOSTAT_DISCARD] +=
> +				cpu_dkstats->ios[STAT_DISCARD];
> +			// convert sectors to bytes
> +			tmp.bytes[BLKG_IOSTAT_READ] +=
> +				cpu_dkstats->sectors[STAT_READ] << 9;
> +			tmp.bytes[BLKG_IOSTAT_WRITE] +=
> +				cpu_dkstats->sectors[STAT_WRITE] << 9;
> +			tmp.bytes[BLKG_IOSTAT_DISCARD] +=
> +				cpu_dkstats->sectors[STAT_DISCARD] << 9;
> +
> +			u64_stats_update_begin(&blkg->iostat.sync);
> +			blkg_iostat_set(&blkg->iostat.cur, &tmp);
> +			u64_stats_update_end(&blkg->iostat.sync);
> +		}
> +	}
> +}
...
> diff --git a/block/genhd.c b/block/genhd.c
> index afdb2c3e5b22..4f5f4590517c 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -38,8 +38,6 @@ static struct kobject *block_depr;
>  static DEFINE_SPINLOCK(ext_devt_lock);
>  static DEFINE_IDR(ext_devt_idr);
>  
> -static const struct device_type disk_type;
> -
>  static void disk_check_events(struct disk_events *ev,
>  			      unsigned int *clearing_ptr);
>  static void disk_alloc_events(struct gendisk *disk);
> @@ -1566,7 +1564,7 @@ static char *block_devnode(struct device *dev, umode_t *mode,
>  	return NULL;
>  }
>  
> -static const struct device_type disk_type = {
> +const struct device_type disk_type = {
>  	.name		= "disk",
>  	.groups		= disk_attr_groups,
>  	.release	= disk_release,
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index a9384449465a..ea38bc36bc6d 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -26,6 +26,7 @@
>  #define disk_to_dev(disk)	(&(disk)->part0.__dev)
>  #define part_to_dev(part)	(&((part)->__dev))
>  
> +extern const struct device_type disk_type;

So, this is fine but I'd explicitly mention it in the patch description.

Thanks.

-- 
tejun

  reply	other threads:[~2020-06-01 15:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-29 23:20 [PATCH blk-cgroup/for-5.8] blk-cgroup: show global disk stats in root cgroup io.stat Boris Burkov
2020-06-01 15:43 ` Tejun Heo [this message]
2020-06-01 20:11   ` [PATCH 1/2 blk-cgroup/for-5.8] blk-cgroup: make iostat functions visible to stat printing Boris Burkov
2020-06-01 20:41     ` Tejun Heo
2020-06-01 20:12   ` [PATCH 2/2 blk-cgroup/for-5.8] blk-cgroup: show global disk stats in root cgroup io.stat Boris Burkov
2020-06-01 20:42     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200601154351.GD31548@mtj.thefacebook.com \
    --to=tj@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=boris@bur.io \
    --cc=cgroups@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).