Linux cgroups development
 help / color / mirror / Atom feed
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Boris Burkov <boris-UrQYPotlwNs@public.gmane.org>
Cc: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-team-b10kYP2dOMg@public.gmane.org
Subject: Re: [PATCH blk-cgroup/for-5.8] blk-cgroup: show global disk stats in root cgroup io.stat
Date: Mon, 1 Jun 2020 11:43:51 -0400	[thread overview]
Message-ID: <20200601154351.GD31548@mtj.thefacebook.com> (raw)
In-Reply-To: <20200529232017.1795920-1-boris-UrQYPotlwNs@public.gmane.org>

Hello, Boris.

On Fri, May 29, 2020 at 04:20:17PM -0700, Boris Burkov wrote:
> In order to improve consistency and usability in cgroup stat accounting,
> we would like to support the root cgroup's io.stat.
> 
> Since the root cgroup has processes doing io even if the system has no
> explicitly created cgroups, we need to be careful to avoid overhead in
> that case.  For that reason, the rstat algorithms don't handle the root
> cgroup, so just turning the file on wouldn't give correct statistics.
> 
> To get around this, we simulate flushing the iostat struct by filling it
> out directly from global disk stats. The result is a root cgroup io.stat
> file consistent with both /proc/diskstats and io.stat.
> 
> Signed-off-by: Boris Burkov <boris-UrQYPotlwNs@public.gmane.org>
> Suggested-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
...
> +static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src)
> +{

Can you please separate out code reorganization to a separate patch so that
the actual change can be reviewed clearly?

> +/*
> + * The rstat algorithms intentionally don't handle the root cgroup to avoid
> + * incurring overhead when no cgroups are defined. For that reason,
> + * cgroup_rstat_flush in blkcg_print_stat does not actually fill out the
> + * iostat in the root cgroup's blkcg_gq.
> + *
> + * However, we would like to re-use the printing code between the root and
> + * non-root cgroups to the extent possible. For that reason, we simulate
> + * flushing the root cgroup's stats by explicitly filling in the iostat
> + * with disk level statistics.
> + */

This is clever and neat.

> +static void blkcg_fill_root_iostats(void)
> +{
> +	struct class_dev_iter iter;
> +	struct device *dev;
> +
> +	class_dev_iter_init(&iter, &block_class, NULL, &disk_type);
> +	while ((dev = class_dev_iter_next(&iter))) {
> +		struct gendisk *disk = dev_to_disk(dev);
> +		struct hd_struct *part = disk_get_part(disk, 0);
> +		struct blkcg_gq *blkg = blk_queue_root_blkg(disk->queue);
> +		struct blkg_iostat tmp;
> +		int cpu;
> +
> +		memset(&tmp, 0, sizeof(tmp));
> +		for_each_possible_cpu(cpu) {
> +			struct disk_stats *cpu_dkstats;
> +
> +			cpu_dkstats = per_cpu_ptr(part->dkstats, cpu);
> +			tmp.ios[BLKG_IOSTAT_READ] +=
> +				cpu_dkstats->ios[STAT_READ];
> +			tmp.ios[BLKG_IOSTAT_WRITE] +=
> +				cpu_dkstats->ios[STAT_WRITE];
> +			tmp.ios[BLKG_IOSTAT_DISCARD] +=
> +				cpu_dkstats->ios[STAT_DISCARD];
> +			// convert sectors to bytes
> +			tmp.bytes[BLKG_IOSTAT_READ] +=
> +				cpu_dkstats->sectors[STAT_READ] << 9;
> +			tmp.bytes[BLKG_IOSTAT_WRITE] +=
> +				cpu_dkstats->sectors[STAT_WRITE] << 9;
> +			tmp.bytes[BLKG_IOSTAT_DISCARD] +=
> +				cpu_dkstats->sectors[STAT_DISCARD] << 9;
> +
> +			u64_stats_update_begin(&blkg->iostat.sync);
> +			blkg_iostat_set(&blkg->iostat.cur, &tmp);
> +			u64_stats_update_end(&blkg->iostat.sync);
> +		}
> +	}
> +}
...
> diff --git a/block/genhd.c b/block/genhd.c
> index afdb2c3e5b22..4f5f4590517c 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -38,8 +38,6 @@ static struct kobject *block_depr;
>  static DEFINE_SPINLOCK(ext_devt_lock);
>  static DEFINE_IDR(ext_devt_idr);
>  
> -static const struct device_type disk_type;
> -
>  static void disk_check_events(struct disk_events *ev,
>  			      unsigned int *clearing_ptr);
>  static void disk_alloc_events(struct gendisk *disk);
> @@ -1566,7 +1564,7 @@ static char *block_devnode(struct device *dev, umode_t *mode,
>  	return NULL;
>  }
>  
> -static const struct device_type disk_type = {
> +const struct device_type disk_type = {
>  	.name		= "disk",
>  	.groups		= disk_attr_groups,
>  	.release	= disk_release,
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index a9384449465a..ea38bc36bc6d 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -26,6 +26,7 @@
>  #define disk_to_dev(disk)	(&(disk)->part0.__dev)
>  #define part_to_dev(part)	(&((part)->__dev))
>  
> +extern const struct device_type disk_type;

So, this is fine but I'd explicitly mention it in the patch description.

Thanks.

-- 
tejun

  parent reply	other threads:[~2020-06-01 15:43 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-29 23:20 [PATCH blk-cgroup/for-5.8] blk-cgroup: show global disk stats in root cgroup io.stat Boris Burkov
     [not found] ` <20200529232017.1795920-1-boris-UrQYPotlwNs@public.gmane.org>
2020-06-01 15:43   ` Tejun Heo [this message]
     [not found]     ` <20200601154351.GD31548-146+VewaZzwNjtGbbfXrCEEOCMrvLtNR@public.gmane.org>
2020-06-01 20:11       ` [PATCH 1/2 blk-cgroup/for-5.8] blk-cgroup: make iostat functions visible to stat printing Boris Burkov
     [not found]         ` <20200601201143.1657414-1-boris-UrQYPotlwNs@public.gmane.org>
2020-06-01 20:41           ` Tejun Heo
2020-06-01 20:12     ` [PATCH 2/2 blk-cgroup/for-5.8] blk-cgroup: show global disk stats in root cgroup io.stat Boris Burkov
     [not found]       ` <20200601201205.1658417-1-boris-UrQYPotlwNs@public.gmane.org>
2020-06-01 20:42         ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200601154351.GD31548@mtj.thefacebook.com \
    --to=tj-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
    --cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
    --cc=boris-UrQYPotlwNs@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=kernel-team-b10kYP2dOMg@public.gmane.org \
    --cc=linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox