linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-block@vger.kernel.org, axboe@kernel.dk, tj@kernel.org,
	lizefan@huawei.com, Kernel-team@fb.com, Shaohua Li <shli@fb.com>
Subject: Re: [RFC] block/loop: make loop cgroup aware
Date: Wed, 23 Aug 2017 12:30:30 -0700	[thread overview]
Message-ID: <20170823193030.5zyktynbcomvssq7@kernel.org> (raw)
In-Reply-To: <20170823192125.GB7935@redhat.com>

On Wed, Aug 23, 2017 at 03:21:25PM -0400, Vivek Goyal wrote:
> On Wed, Aug 23, 2017 at 11:15:15AM -0700, Shaohua Li wrote:
> > From: Shaohua Li <shli@fb.com>
> > 
> > Not a merge request, for discussion only.
> > 
> > loop block device handles IO in a separate thread. The actual IO
> > dispatched isn't cloned from the IO loop device received, so the
> > dispatched IO loses the cgroup context.
> > 
> > I'm ignoring buffer IO case now, which is quite complicated.  Making the
> > loop thread aware of cgroup context doesn't really help. The loop device
> > only writes to a single file. In current writeback cgroup
> > implementation, the file can only belong to one cgroup.
> > 
> > For direct IO case, we could workaround the issue in theory. For
> > example, say we assign cgroup1 5M/s BW for loop device and cgroup2
> > 10M/s. We can create a special cgroup for loop thread and assign at
> > least 15M/s for the underlayer disk. In this way, we correctly throttle
> > the two cgroups. But this is tricky to setup.
> > 
> > This patch tries to address the issue. When loop thread is handling IO,
> > it declares the IO is on behalf of the original task, then in block IO
> > we use the original task to find cgroup. The concept probably works for
> > other scenarios too, but right now I don't make it generic yet.
> 
> Hi Shaohua, 
> 
> Can worker thread switch/move to the cgroup bio is in and do the submision
> and then switch back. That way IO submitted by worker should be accounted
> to the cgroup bio belongs to. Just a thought. I don't even know if that's
> feasible.

That is my first attempt, but looks moving thread to a cgroup is an
expensive operation.

Thanks,
Shaohua

> 
> Vivek
> 
> > 
> > Signed-off-by: Shaohua Li <shli@fb.com>
> > ---
> >  block/bio.c                |  5 ++++-
> >  drivers/block/loop.c       | 14 +++++++++++++-
> >  drivers/block/loop.h       |  1 +
> >  include/linux/blk-cgroup.h |  3 ++-
> >  include/linux/sched.h      |  1 +
> >  5 files changed, 21 insertions(+), 3 deletions(-)
> > 
> > diff --git a/block/bio.c b/block/bio.c
> > index e241bbc..8f0df3c 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -2058,7 +2058,10 @@ int bio_associate_current(struct bio *bio)
> >  
> >  	get_io_context_active(ioc);
> >  	bio->bi_ioc = ioc;
> > -	bio->bi_css = task_get_css(current, io_cgrp_id);
> > +	if (current->cgroup_task)
> > +		bio->bi_css = task_get_css(current->cgroup_task, io_cgrp_id);
> > +	else
> > +		bio->bi_css = task_get_css(current, io_cgrp_id);
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(bio_associate_current);
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index ef83349..fefede3 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -77,7 +77,7 @@
> >  #include <linux/falloc.h>
> >  #include <linux/uio.h>
> >  #include "loop.h"
> > -
> > +#include <linux/sched/task.h>
> >  #include <linux/uaccess.h>
> >  
> >  static DEFINE_IDR(loop_index_idr);
> > @@ -471,6 +471,8 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
> >  {
> >  	struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb);
> >  
> > +	if (cmd->cgroup_task)
> > +		put_task_struct(cmd->cgroup_task);
> >  	cmd->ret = ret;
> >  	blk_mq_complete_request(cmd->rq);
> >  }
> > @@ -502,11 +504,16 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
> >  	cmd->iocb.ki_complete = lo_rw_aio_complete;
> >  	cmd->iocb.ki_flags = IOCB_DIRECT;
> >  
> > +	if (cmd->cgroup_task)
> > +		current->cgroup_task = cmd->cgroup_task;
> > +
> >  	if (rw == WRITE)
> >  		ret = call_write_iter(file, &cmd->iocb, &iter);
> >  	else
> >  		ret = call_read_iter(file, &cmd->iocb, &iter);
> >  
> > +	current->cgroup_task = NULL;
> > +
> >  	if (ret != -EIOCBQUEUED)
> >  		cmd->iocb.ki_complete(&cmd->iocb, ret, 0);
> >  	return 0;
> > @@ -1705,6 +1712,11 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
> >  		break;
> >  	}
> >  
> > +	if (cmd->use_aio) {
> > +		cmd->cgroup_task = current;
> > +		get_task_struct(current);
> > +	} else
> > +		cmd->cgroup_task = NULL;
> >  	kthread_queue_work(&lo->worker, &cmd->work);
> >  
> >  	return BLK_STS_OK;
> > diff --git a/drivers/block/loop.h b/drivers/block/loop.h
> > index 2c096b9..eb98d4d 100644
> > --- a/drivers/block/loop.h
> > +++ b/drivers/block/loop.h
> > @@ -73,6 +73,7 @@ struct loop_cmd {
> >  	bool use_aio;           /* use AIO interface to handle I/O */
> >  	long ret;
> >  	struct kiocb iocb;
> > +	struct task_struct *cgroup_task;
> >  };
> >  
> >  /* Support for loadable transfer modules */
> > diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
> > index 9d92153..38a5517 100644
> > --- a/include/linux/blk-cgroup.h
> > +++ b/include/linux/blk-cgroup.h
> > @@ -232,7 +232,8 @@ static inline struct blkcg *bio_blkcg(struct bio *bio)
> >  {
> >  	if (bio && bio->bi_css)
> >  		return css_to_blkcg(bio->bi_css);
> > -	return task_blkcg(current);
> > +	return task_blkcg(current->cgroup_task ?
> > +			current->cgroup_task : current);
> >  }
> >  
> >  static inline struct cgroup_subsys_state *
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 8337e2d..a5958b0 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -897,6 +897,7 @@ struct task_struct {
> >  	struct css_set __rcu		*cgroups;
> >  	/* cg_list protected by css_set_lock and tsk->alloc_lock: */
> >  	struct list_head		cg_list;
> > +	struct task_struct		*cgroup_task;
> >  #endif
> >  #ifdef CONFIG_INTEL_RDT_A
> >  	int				closid;
> > -- 
> > 2.9.5

  reply	other threads:[~2017-08-23 19:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-23 18:15 [RFC] block/loop: make loop cgroup aware Shaohua Li
2017-08-23 19:21 ` Vivek Goyal
2017-08-23 19:30   ` Shaohua Li [this message]
2017-08-28 22:54 ` Tejun Heo
2017-08-29 15:22   ` Shaohua Li
2017-08-29 15:28     ` Tejun Heo
2017-08-30  5:07       ` Shaohua Li
2017-08-31  0:36         ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170823193030.5zyktynbcomvssq7@kernel.org \
    --to=shli@kernel.org \
    --cc=Kernel-team@fb.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=shli@fb.com \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).