All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Shaohua Li <shli@kernel.org>
Cc: linux-block@vger.kernel.org, axboe@kernel.dk, lizefan@huawei.com,
	Kernel-team@fb.com, Shaohua Li <shli@fb.com>
Subject: Re: [RFC] block/loop: make loop cgroup aware
Date: Mon, 28 Aug 2017 15:54:59 -0700	[thread overview]
Message-ID: <20170828225459.GE491396@devbig577.frc2.facebook.com> (raw)
In-Reply-To: <dd357e1638021132cd39a11769a745a8318636d3.1503506490.git.shli@fb.com>

Hello, Shaohua.

On Wed, Aug 23, 2017 at 11:15:15AM -0700, Shaohua Li wrote:
> loop block device handles IO in a separate thread. The actual IO
> dispatched isn't cloned from the IO loop device received, so the
> dispatched IO loses the cgroup context.

Ah, right.  Thanks for spotting this.

> I'm ignoring buffer IO case now, which is quite complicated.  Making the
> loop thread aware of cgroup context doesn't really help. The loop device
> only writes to a single file. In current writeback cgroup
> implementation, the file can only belong to one cgroup.

Yeah, writeback tracks the most active cgroup and associates writeback
ios with that cgroup.  For buffered loop devices, I think it'd be fine
to make the loop driver forward the cgroup association and let the
writeback layer determine the matching association as usual.

> For direct IO case, we could workaround the issue in theory. For
> example, say we assign cgroup1 5M/s BW for loop device and cgroup2
> 10M/s. We can create a special cgroup for loop thread and assign at
> least 15M/s for the underlayer disk. In this way, we correctly throttle
> the two cgroups. But this is tricky to setup.
> 
> This patch tries to address the issue. When loop thread is handling IO,
> it declares the IO is on behalf of the original task, then in block IO
> we use the original task to find cgroup. The concept probably works for
> other scenarios too, but right now I don't make it generic yet.

The overall approach looks sound to me.  Some comments below.

> @@ -2058,7 +2058,10 @@ int bio_associate_current(struct bio *bio)
>  
>  	get_io_context_active(ioc);
>  	bio->bi_ioc = ioc;
> -	bio->bi_css = task_get_css(current, io_cgrp_id);
> +	if (current->cgroup_task)
> +		bio->bi_css = task_get_css(current->cgroup_task, io_cgrp_id);
> +	else
> +		bio->bi_css = task_get_css(current, io_cgrp_id);

Do we need a full pointer field for this?  I think we should be able
to mark lo writers with a flag (PF or whatever) and then to
kthread_data() to get the lo struct which contains the target css.
Oh, let's do csses instead of tasks for consistency, correctness
(please see below) and performance (csses are cheaper to pin).

> @@ -502,11 +504,16 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
>  	cmd->iocb.ki_complete = lo_rw_aio_complete;
>  	cmd->iocb.ki_flags = IOCB_DIRECT;
>  
> +	if (cmd->cgroup_task)
> +		current->cgroup_task = cmd->cgroup_task;
> +
>  	if (rw == WRITE)
>  		ret = call_write_iter(file, &cmd->iocb, &iter);
>  	else
>  		ret = call_read_iter(file, &cmd->iocb, &iter);
>  
> +	current->cgroup_task = NULL;
> +
>  	if (ret != -EIOCBQUEUED)
>  		cmd->iocb.ki_complete(&cmd->iocb, ret, 0);
>  	return 0;

Hmm... why do we need double forwarding (original issuer -> aio cmd ->
ios) in loop?  If we need this, doesn't this mean that we're missing
ownership forwarding in aio paths and should make the forwarding
happen there?

> @@ -1705,6 +1712,11 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
>  		break;
>  	}
>  
> +	if (cmd->use_aio) {
> +		cmd->cgroup_task = current;
> +		get_task_struct(current);
> +	} else
> +		cmd->cgroup_task = NULL;

What if %current isn't the original issuer of the io?  It could be a
writeback worker trying to flush to a loop device and we don't want to
attribute those ios to the writeback worker.  We should forward the
bi_css not the current task.

Thanks!

-- 
tejun

  parent reply	other threads:[~2017-08-28 22:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-23 18:15 [RFC] block/loop: make loop cgroup aware Shaohua Li
2017-08-23 19:21 ` Vivek Goyal
2017-08-23 19:30   ` Shaohua Li
2017-08-28 22:54 ` Tejun Heo [this message]
2017-08-29 15:22   ` Shaohua Li
2017-08-29 15:28     ` Tejun Heo
2017-08-30  5:07       ` Shaohua Li
2017-08-31  0:36         ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170828225459.GE491396@devbig577.frc2.facebook.com \
    --to=tj@kernel.org \
    --cc=Kernel-team@fb.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=shli@fb.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.