blkio cgroups controller doesn't work with LVM?

cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* blkio cgroups controller doesn't work with LVM?
@ 2016-02-24 18:12 Chris Friesen
       [not found] ` <56CDF283.9010802-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Friesen @ 2016-02-24 18:12 UTC (permalink / raw)
  To: dm-devel-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA

Hi,

Are there known limitations with the blkio cgroup controller when used with LVM?

I'm using Ubuntu 15.10 with the 4.2 kernel.  I got the same results with CentOS 7.

I set up two groups, /sys/fs/cgroup/blkio/test1 and /sys/fs/cgroup/blkio/test2. 
  I set the weight for test1 to 500, and the weight for test2 to 1000.

If I then read/write to a file on /dev/sda1 from a process in each group, then 
everything works as expected and test1 gets half the bandwidth of test2.

If I do the same but accessing an LVM logical volume, where the underlying 
physical volume is /dev/sda4, then both groups get the same bandwidth.

It appears that the group weight doesn't get propagated down through all the 
layers.  Am I missing something?  If not, is this documented anywhere?

Thanks,
Chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: blkio cgroups controller doesn't work with LVM?
       [not found] ` <56CDF283.9010802-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org>
@ 2016-02-25  7:48   ` Nikolay Borisov
       [not found]     ` <56CEB1BC.4000005-6AxghH7DbtA@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Nikolay Borisov @ 2016-02-25  7:48 UTC (permalink / raw)
  To: Chris Friesen, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA



On 02/24/2016 08:12 PM, Chris Friesen wrote:
> 
> Hi,
> 
> Are there known limitations with the blkio cgroup controller when used
> with LVM?
> 
> I'm using Ubuntu 15.10 with the 4.2 kernel.  I got the same results with
> CentOS 7.
> 
> I set up two groups, /sys/fs/cgroup/blkio/test1 and
> /sys/fs/cgroup/blkio/test2.  I set the weight for test1 to 500, and the
> weight for test2 to 1000.

The weighed mode of blkio works only with CFQ scheduler. And as far as I
have seen you cannot set CFQ to be the scheduler of DM devices. In this
case you can use the BLK io throttling mechanism. That's what I've
encountered in my practice. Though I'd be happy to be proven wrong by
someone. I believe the following sentence in the blkio controller states
that:
"
First one is proportional weight time based division of disk policy. It
is implemented in CFQ. Hence this policy takes effect only on leaf nodes
when CFQ is being used.
"
> 
> If I then read/write to a file on /dev/sda1 from a process in each
> group, then everything works as expected and test1 gets half the
> bandwidth of test2.
> 
> If I do the same but accessing an LVM logical volume, where the
> underlying physical volume is /dev/sda4, then both groups get the same
> bandwidth.
> 
> It appears that the group weight doesn't get propagated down through all
> the layers.  Am I missing something?  If not, is this documented anywhere?
> 
> Thanks,
> Chris
> -- 
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: blkio cgroups controller doesn't work with LVM?
       [not found]     ` <56CEB1BC.4000005-6AxghH7DbtA@public.gmane.org>
@ 2016-02-25 14:53       ` Mike Snitzer
       [not found]         ` <20160225145314.GA20699-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Mike Snitzer @ 2016-02-25 14:53 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Chris Friesen, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, tejun-DgEjT+Ai2ygdnm+yROfE0A,
	Vivek Goyal

On Thu, Feb 25 2016 at  2:48am -0500,
Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> wrote:

> 
> 
> On 02/24/2016 08:12 PM, Chris Friesen wrote:
> > 
> > Hi,
> > 
> > Are there known limitations with the blkio cgroup controller when used
> > with LVM?
> > 
> > I'm using Ubuntu 15.10 with the 4.2 kernel.  I got the same results with
> > CentOS 7.
> > 
> > I set up two groups, /sys/fs/cgroup/blkio/test1 and
> > /sys/fs/cgroup/blkio/test2.  I set the weight for test1 to 500, and the
> > weight for test2 to 1000.
> 
> The weighed mode of blkio works only with CFQ scheduler. And as far as I
> have seen you cannot set CFQ to be the scheduler of DM devices. In this
> case you can use the BLK io throttling mechanism. That's what I've
> encountered in my practice. Though I'd be happy to be proven wrong by
> someone. I believe the following sentence in the blkio controller states
> that:
> "
> First one is proportional weight time based division of disk policy. It
> is implemented in CFQ. Hence this policy takes effect only on leaf nodes
> when CFQ is being used.
> "

Right, LVM created devices are bio-based DM devices in the kernel.
bio-based block devices do _not_ have an IO scheduler.  Their underlying
request-based device does.

I'm not well-versed on the top-level cgroup interface and how it maps to
associated resources that are established in the kernel.  But it could
be that the configuration of blkio cgroup against a bio-based LVM device
needs to be passed through to the underlying request-based device
(e.g. /dev/sda4 in Chris's case)?

I'm also wondering whether the latest cgroup work that Tejun has just
finished (afaik to support buffered IO in the IO controller) will afford
us a more meaningful reason to work to make cgroups' blkio controller
actually work with bio-based devices like LVM's DM devices?

I'm very much open to advice on how to proceed with investigating this
integration work.  Tejun, Vivek, anyone else: if you have advice on next
steps for DM on this front _please_ yell, thanks!

Mike

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: blkio cgroups controller doesn't work with LVM?
       [not found]         ` <20160225145314.GA20699-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-02-25 15:15           ` Chris Friesen
  2016-02-26 16:42           ` [dm-devel] " Vivek Goyal
  2016-03-02 16:06           ` Tejun Heo
  2 siblings, 0 replies; 22+ messages in thread
From: Chris Friesen @ 2016-02-25 15:15 UTC (permalink / raw)
  To: Mike Snitzer, Nikolay Borisov
  Cc: dm-devel-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	tejun-DgEjT+Ai2ygdnm+yROfE0A, Vivek Goyal

On 02/25/2016 08:53 AM, Mike Snitzer wrote:
> On Thu, Feb 25 2016 at  2:48am -0500,
> Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> wrote:
>> On 02/24/2016 08:12 PM, Chris Friesen wrote:
>>>
>>> Are there known limitations with the blkio cgroup controller when used
>>> with LVM?
>>>
>>> I'm using Ubuntu 15.10 with the 4.2 kernel.  I got the same results with
>>> CentOS 7.
>>>
>>> I set up two groups, /sys/fs/cgroup/blkio/test1 and
>>> /sys/fs/cgroup/blkio/test2.  I set the weight for test1 to 500, and the
>>> weight for test2 to 1000.
>>
>> The weighed mode of blkio works only with CFQ scheduler. And as far as I
>> have seen you cannot set CFQ to be the scheduler of DM devices. In this
>> case you can use the BLK io throttling mechanism. That's what I've
>> encountered in my practice. Though I'd be happy to be proven wrong by
>> someone. I believe the following sentence in the blkio controller states
>> that:
>> "
>> First one is proportional weight time based division of disk policy. It
>> is implemented in CFQ. Hence this policy takes effect only on leaf nodes
>> when CFQ is being used.
>> "
>
> Right, LVM created devices are bio-based DM devices in the kernel.
> bio-based block devices do _not_ have an IO scheduler.  Their underlying
> request-based device does.

In my particular case I did ensure that the underlying /dev/sda scheduler was 
set to CFQ.  I had expected that would be sufficient.

> I'm not well-versed on the top-level cgroup interface and how it maps to
> associated resources that are established in the kernel.  But it could
> be that the configuration of blkio cgroup against a bio-based LVM device
> needs to be passed through to the underlying request-based device
> (e.g. /dev/sda4 in Chris's case)?

In my case the process is placed in a cgroup, which is assigned a relative 
weight.  I didn't specify any of per-device weights.

My suspicion is that the cgroup information for the writing process is being 
lost as the request passes through the DM layer.

This is disappointing, given how long both DM and cgroups have been around.

Chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] blkio cgroups controller doesn't work with LVM?
       [not found]         ` <20160225145314.GA20699-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-02-25 15:15           ` Chris Friesen
@ 2016-02-26 16:42           ` Vivek Goyal
       [not found]             ` <20160226164228.GA24711-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-03-02 16:06           ` Tejun Heo
  2 siblings, 1 reply; 22+ messages in thread
From: Vivek Goyal @ 2016-02-26 16:42 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Nikolay Borisov, Chris Friesen, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	tejun-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu, Feb 25, 2016 at 09:53:14AM -0500, Mike Snitzer wrote:
> On Thu, Feb 25 2016 at  2:48am -0500,
> Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> wrote:
> 
> > 
> > 
> > On 02/24/2016 08:12 PM, Chris Friesen wrote:
> > > 
> > > Hi,
> > > 
> > > Are there known limitations with the blkio cgroup controller when used
> > > with LVM?
> > > 
> > > I'm using Ubuntu 15.10 with the 4.2 kernel.  I got the same results with
> > > CentOS 7.
> > > 
> > > I set up two groups, /sys/fs/cgroup/blkio/test1 and
> > > /sys/fs/cgroup/blkio/test2.  I set the weight for test1 to 500, and the
> > > weight for test2 to 1000.
> > 
> > The weighed mode of blkio works only with CFQ scheduler. And as far as I
> > have seen you cannot set CFQ to be the scheduler of DM devices. In this
> > case you can use the BLK io throttling mechanism. That's what I've
> > encountered in my practice. Though I'd be happy to be proven wrong by
> > someone. I believe the following sentence in the blkio controller states
> > that:
> > "
> > First one is proportional weight time based division of disk policy. It
> > is implemented in CFQ. Hence this policy takes effect only on leaf nodes
> > when CFQ is being used.
> > "
> 
> Right, LVM created devices are bio-based DM devices in the kernel.
> bio-based block devices do _not_ have an IO scheduler.  Their underlying
> request-based device does.
> 
> I'm not well-versed on the top-level cgroup interface and how it maps to
> associated resources that are established in the kernel.  But it could
> be that the configuration of blkio cgroup against a bio-based LVM device
> needs to be passed through to the underlying request-based device
> (e.g. /dev/sda4 in Chris's case)?
> 
> I'm also wondering whether the latest cgroup work that Tejun has just
> finished (afaik to support buffered IO in the IO controller) will afford
> us a more meaningful reason to work to make cgroups' blkio controller
> actually work with bio-based devices like LVM's DM devices?
> 
> I'm very much open to advice on how to proceed with investigating this
> integration work.  Tejun, Vivek, anyone else: if you have advice on next
> steps for DM on this front _please_ yell, thanks!

Ok, here is my understanding. Tejun, please correct me if that's not the
case anymore. I have not been able to keep pace with all the recent work.

IO throttling policies should be applied on top level dm devices and these
should work for reads and direct writes.

For IO throttling buffered writes, I think it might not work on dm devices
as it because we might not be copying cgroup information when cloning
happens in dm layer.

IIRC, one concern with cloning cgroup info from parent bio was that how
would one take care of any priority inversion issues. For example, we are
waiting for a clone to finish IO which is in severely throttled IO cgroup
and rest of the IO can't proceed till that IO finishes).

IIUC, there might not be a straight forward answer to that question. We
probably will have to look at all the dm code closely and if that
serialization is possible in any of the paths, then reset the cgroup info.

For CFQ's proportional policy, it might not work well when a dm device
is sitting on top. And reason being that for all reads and direct writes
we inherit cgroup from submitter and dm might be submitting IO from an
internal thread, hence losing the cgroup of submitter hence IO gets
misclassified at dm level.

To solve this, we will have to carry submitter's cgroup info in bio and
clones and again think of priority inversion issues.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] blkio cgroups controller doesn't work with LVM?
       [not found]             ` <20160226164228.GA24711-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-02-26 16:45               ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2016-02-26 16:45 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Nikolay Borisov, Chris Friesen, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, tj-DgEjT+Ai2ygdnm+yROfE0A

Looks like Tejun's email id in original email is wrong. It should be
tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org and not tejun-DgEjT+Ai2yi4UlQgPVntAg@public.gmane.org Fixing it.

Thanks
Vivek

On Fri, Feb 26, 2016 at 11:42:28AM -0500, Vivek Goyal wrote:
> On Thu, Feb 25, 2016 at 09:53:14AM -0500, Mike Snitzer wrote:
> > On Thu, Feb 25 2016 at  2:48am -0500,
> > Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> wrote:
> > 
> > > 
> > > 
> > > On 02/24/2016 08:12 PM, Chris Friesen wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > Are there known limitations with the blkio cgroup controller when used
> > > > with LVM?
> > > > 
> > > > I'm using Ubuntu 15.10 with the 4.2 kernel.  I got the same results with
> > > > CentOS 7.
> > > > 
> > > > I set up two groups, /sys/fs/cgroup/blkio/test1 and
> > > > /sys/fs/cgroup/blkio/test2.  I set the weight for test1 to 500, and the
> > > > weight for test2 to 1000.
> > > 
> > > The weighed mode of blkio works only with CFQ scheduler. And as far as I
> > > have seen you cannot set CFQ to be the scheduler of DM devices. In this
> > > case you can use the BLK io throttling mechanism. That's what I've
> > > encountered in my practice. Though I'd be happy to be proven wrong by
> > > someone. I believe the following sentence in the blkio controller states
> > > that:
> > > "
> > > First one is proportional weight time based division of disk policy. It
> > > is implemented in CFQ. Hence this policy takes effect only on leaf nodes
> > > when CFQ is being used.
> > > "
> > 
> > Right, LVM created devices are bio-based DM devices in the kernel.
> > bio-based block devices do _not_ have an IO scheduler.  Their underlying
> > request-based device does.
> > 
> > I'm not well-versed on the top-level cgroup interface and how it maps to
> > associated resources that are established in the kernel.  But it could
> > be that the configuration of blkio cgroup against a bio-based LVM device
> > needs to be passed through to the underlying request-based device
> > (e.g. /dev/sda4 in Chris's case)?
> > 
> > I'm also wondering whether the latest cgroup work that Tejun has just
> > finished (afaik to support buffered IO in the IO controller) will afford
> > us a more meaningful reason to work to make cgroups' blkio controller
> > actually work with bio-based devices like LVM's DM devices?
> > 
> > I'm very much open to advice on how to proceed with investigating this
> > integration work.  Tejun, Vivek, anyone else: if you have advice on next
> > steps for DM on this front _please_ yell, thanks!
> 
> Ok, here is my understanding. Tejun, please correct me if that's not the
> case anymore. I have not been able to keep pace with all the recent work.
> 
> IO throttling policies should be applied on top level dm devices and these
> should work for reads and direct writes.
> 
> For IO throttling buffered writes, I think it might not work on dm devices
> as it because we might not be copying cgroup information when cloning
> happens in dm layer.
> 
> IIRC, one concern with cloning cgroup info from parent bio was that how
> would one take care of any priority inversion issues. For example, we are
> waiting for a clone to finish IO which is in severely throttled IO cgroup
> and rest of the IO can't proceed till that IO finishes).
> 
> IIUC, there might not be a straight forward answer to that question. We
> probably will have to look at all the dm code closely and if that
> serialization is possible in any of the paths, then reset the cgroup info.
> 
> For CFQ's proportional policy, it might not work well when a dm device
> is sitting on top. And reason being that for all reads and direct writes
> we inherit cgroup from submitter and dm might be submitting IO from an
> internal thread, hence losing the cgroup of submitter hence IO gets
> misclassified at dm level.
> 
> To solve this, we will have to carry submitter's cgroup info in bio and
> clones and again think of priority inversion issues.
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: blkio cgroups controller doesn't work with LVM?
       [not found]         ` <20160225145314.GA20699-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-02-25 15:15           ` Chris Friesen
  2016-02-26 16:42           ` [dm-devel] " Vivek Goyal
@ 2016-03-02 16:06           ` Tejun Heo
       [not found]             ` <20160302160649.GB29826-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
  2 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2016-03-02 16:06 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Nikolay Borisov, Chris Friesen, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

Hello,

On Thu, Feb 25, 2016 at 09:53:14AM -0500, Mike Snitzer wrote:
> Right, LVM created devices are bio-based DM devices in the kernel.
> bio-based block devices do _not_ have an IO scheduler.  Their underlying
> request-based device does.

dm devices are not the actual resource source, so I don't think it'd
work too well to put io controllers on them (can't really do things
like proportional control without owning the queue).

> I'm not well-versed on the top-level cgroup interface and how it maps to
> associated resources that are established in the kernel.  But it could
> be that the configuration of blkio cgroup against a bio-based LVM device
> needs to be passed through to the underlying request-based device
> (e.g. /dev/sda4 in Chris's case)?
> 
> I'm also wondering whether the latest cgroup work that Tejun has just
> finished (afaik to support buffered IO in the IO controller) will afford
> us a more meaningful reason to work to make cgroups' blkio controller
> actually work with bio-based devices like LVM's DM devices?
> 
> I'm very much open to advice on how to proceed with investigating this
> integration work.  Tejun, Vivek, anyone else: if you have advice on next
> steps for DM on this front _please_ yell, thanks!

I think the only thing necessary is dm transferring bio cgroup tags to
the bio's that it ends up passing down the stack.  Please take a look
at fs/btrfs/extent_io.c::btrfs_bio_clone() for an example.  We
probably should introduce a wrapper for this so that each site doesn't
need to ifdef it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg() (was: Re: blkio cgroups controller doesn't work with LVM?)
       [not found]             ` <20160302160649.GB29826-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2016-03-02 17:56               ` Mike Snitzer
       [not found]                 ` <20160302175656.GA59991-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Mike Snitzer @ 2016-03-02 17:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Nikolay Borisov, Chris Friesen, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal,
	linux-block-u79uwXL29TY76Z2rM5mHXA, axboe-tSWWG44O7X1aa/9Udqfwiw

On Wed, Mar 02 2016 at 11:06P -0500,
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:

> Hello,
> 
> On Thu, Feb 25, 2016 at 09:53:14AM -0500, Mike Snitzer wrote:
> > Right, LVM created devices are bio-based DM devices in the kernel.
> > bio-based block devices do _not_ have an IO scheduler.  Their underlying
> > request-based device does.
> 
> dm devices are not the actual resource source, so I don't think it'd
> work too well to put io controllers on them (can't really do things
> like proportional control without owning the queue).
> 
> > I'm not well-versed on the top-level cgroup interface and how it maps to
> > associated resources that are established in the kernel.  But it could
> > be that the configuration of blkio cgroup against a bio-based LVM device
> > needs to be passed through to the underlying request-based device
> > (e.g. /dev/sda4 in Chris's case)?
> > 
> > I'm also wondering whether the latest cgroup work that Tejun has just
> > finished (afaik to support buffered IO in the IO controller) will afford
> > us a more meaningful reason to work to make cgroups' blkio controller
> > actually work with bio-based devices like LVM's DM devices?
> > 
> > I'm very much open to advice on how to proceed with investigating this
> > integration work.  Tejun, Vivek, anyone else: if you have advice on next
> > steps for DM on this front _please_ yell, thanks!
> 
> I think the only thing necessary is dm transferring bio cgroup tags to
> the bio's that it ends up passing down the stack.  Please take a look
> at fs/btrfs/extent_io.c::btrfs_bio_clone() for an example.  We
> probably should introduce a wrapper for this so that each site doesn't
> need to ifdef it.
> 
> Thanks.

OK, I think this should do it.  Nikolay and/or others can you test this
patch using blkio cgroups controller with LVM devices and report back?

From: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Wed, 2 Mar 2016 12:37:39 -0500
Subject: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg()

Move btrfs_bio_clone()'s support for transferring a source bio's cgroup
tags to a clone into both bio_clone_bioset() and __bio_clone_fast().
The former is used by btrfs (MD and blk-core also use it via bio_split).
The latter is used by both DM and bcache.

This should enable the blkio cgroups controller to work with all
stacking bio-based block devices.

Reported-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
Suggested-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 block/bio.c          | 10 ++++++++++
 fs/btrfs/extent_io.c |  6 ------
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index cf75915..25812be 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -584,6 +584,11 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
 	bio->bi_rw = bio_src->bi_rw;
 	bio->bi_iter = bio_src->bi_iter;
 	bio->bi_io_vec = bio_src->bi_io_vec;
+
+#ifdef CONFIG_BLK_CGROUP
+	if (bio_src->bi_css)
+		bio_associate_blkcg(bio, bio_src->bi_css);
+#endif
 }
 EXPORT_SYMBOL(__bio_clone_fast);
 
@@ -689,6 +694,11 @@ integrity_clone:
 		}
 	}
 
+#ifdef CONFIG_BLK_CGROUP
+	if (bio_src->bi_css)
+		bio_associate_blkcg(bio, bio_src->bi_css);
+#endif
+
 	return bio;
 }
 EXPORT_SYMBOL(bio_clone_bioset);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 392592d..8abc330 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2691,12 +2691,6 @@ struct bio *btrfs_bio_clone(struct bio *bio, gfp_t gfp_mask)
 		btrfs_bio->csum = NULL;
 		btrfs_bio->csum_allocated = NULL;
 		btrfs_bio->end_io = NULL;
-
-#ifdef CONFIG_BLK_CGROUP
-		/* FIXME, put this into bio_clone_bioset */
-		if (bio->bi_css)
-			bio_associate_blkcg(new, bio->bi_css);
-#endif
 	}
 	return new;
 }
-- 
2.5.4 (Apple Git-61)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg() (was: Re: blkio cgroups controller doesn't work with LVM?)
       [not found]                 ` <20160302175656.GA59991-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-03-02 18:03                   ` Nikolay Borisov
       [not found]                     ` <CAJFSNy6hni1-NWDs0z=Hq223=DfcjsNPoAb7GRAGEPCUXh4Q9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-03-14 15:08                   ` Nikolay Borisov
  1 sibling, 1 reply; 22+ messages in thread
From: Nikolay Borisov @ 2016-03-02 18:03 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Tejun Heo, Nikolay Borisov, Chris Friesen,
	device-mapper development, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Vivek Goyal, linux-block-u79uwXL29TY76Z2rM5mHXA, Jens Axboe,
	SiteGround Operations

Thanks for the patch I will likely have time to test this sometime next week.
But just to be sure - the expected behavior would be that processes
writing to dm-based devices would experience the fair-shair
scheduling of CFQ (provided that the physical devices that back those
DM devices use CFQ), correct?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: block: transfer source bio's cgroup tags to clone via bio_associate_blkcg() (was: Re: blkio cgroups controller doesn't work with LVM?)
       [not found]                     ` <CAJFSNy6hni1-NWDs0z=Hq223=DfcjsNPoAb7GRAGEPCUXh4Q9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-02 18:05                       ` Mike Snitzer
  2016-03-02 19:18                       ` [PATCH] " Vivek Goyal
  1 sibling, 0 replies; 22+ messages in thread
From: Mike Snitzer @ 2016-03-02 18:05 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Jens Axboe, SiteGround Operations,
	linux-block-u79uwXL29TY76Z2rM5mHXA, device-mapper development,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Chris Friesen,
	Vivek Goyal

On Wed, Mar 02 2016 at  1:03P -0500,
Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> wrote:

> Thanks for the patch I will likely have time to test this sometime next week.
> But just to be sure - the expected behavior would be that processes
> writing to dm-based devices would experience the fair-shair
> scheduling of CFQ (provided that the physical devices that back those
> DM devices use CFQ), correct?

Yes, that is the goal.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg() (was: Re: blkio cgroups controller doesn't work with LVM?)
       [not found]                     ` <CAJFSNy6hni1-NWDs0z=Hq223=DfcjsNPoAb7GRAGEPCUXh4Q9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-03-02 18:05                       ` Mike Snitzer
@ 2016-03-02 19:18                       ` Vivek Goyal
  2016-03-02 19:59                         ` Nikolay Borisov
  1 sibling, 1 reply; 22+ messages in thread
From: Vivek Goyal @ 2016-03-02 19:18 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Mike Snitzer, Tejun Heo, Chris Friesen, device-mapper development,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jens Axboe,
	SiteGround Operations

On Wed, Mar 02, 2016 at 08:03:10PM +0200, Nikolay Borisov wrote:
> Thanks for the patch I will likely have time to test this sometime next week.
> But just to be sure - the expected behavior would be that processes
> writing to dm-based devices would experience the fair-shair
> scheduling of CFQ (provided that the physical devices that back those
> DM devices use CFQ), correct?

Nikolay,

I am not sure how well it will work with CFQ of underlying device. It will
get cgroup information right for buffered writes. But cgroup information
for reads and direct writes will come from submitter's context and if dm
layer gets in between, then many a times submitter might be a worker
thread and IO will be attributed to that worker's cgroup (root cgroup).

Give it a try anyway.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg() (was: Re: blkio cgroups controller doesn't work with LVM?)
  2016-03-02 19:18                       ` [PATCH] " Vivek Goyal
@ 2016-03-02 19:59                         ` Nikolay Borisov
  2016-03-02 20:10                           ` Vivek Goyal
  0 siblings, 1 reply; 22+ messages in thread
From: Nikolay Borisov @ 2016-03-02 19:59 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Jens Axboe, Mike Snitzer, SiteGround Operations,
	linux-block@vger.kernel.org, device-mapper development,
	Nikolay Borisov, cgroups@vger.kernel.org, Tejun Heo,
	Chris Friesen


[-- Attachment #1.1: Type: text/plain, Size: 1412 bytes --]

On Wednesday, March 2, 2016, Vivek Goyal <vgoyal@redhat.com> wrote:

> On Wed, Mar 02, 2016 at 08:03:10PM +0200, Nikolay Borisov wrote:
> > Thanks for the patch I will likely have time to test this sometime next
> week.
> > But just to be sure - the expected behavior would be that processes
> > writing to dm-based devices would experience the fair-shair
> > scheduling of CFQ (provided that the physical devices that back those
> > DM devices use CFQ), correct?
>
> Nikolay,
>
> I am not sure how well it will work with CFQ of underlying device. It will
> get cgroup information right for buffered writes. But cgroup information


 Right, what's your definition of  buffered writes? My mental model is that
when a process submits a write request to a dm device , the bio is going to
be put on a devi e workqueue which would then  be serviced by a background
worker thread and later the submitter notified. Do you refer to this whole
gamut of operations as buffered writes?

for reads and direct writes will come from submitter's context and if dm
> layer gets in between, then many a times submitter might be a worker
> thread and IO will be attributed to that worker's cgroup (root cgroup).


Be that as it may, proivded that the worker thread is in the  'correct'
cgroup,  then the appropriate babdwidth policies should apply, no?

>
> Give it a try anyway.


Most certainly I will :)


>
> Thanks
> Vivek
>

[-- Attachment #1.2: Type: text/html, Size: 2111 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg() (was: Re: blkio cgroups controller doesn't work with LVM?)
  2016-03-02 19:59                         ` Nikolay Borisov
@ 2016-03-02 20:10                           ` Vivek Goyal
  2016-03-02 20:19                             ` Nikolay Borisov
       [not found]                             ` <20160302201016.GE3476-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 2 replies; 22+ messages in thread
From: Vivek Goyal @ 2016-03-02 20:10 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Jens Axboe, Mike Snitzer, SiteGround Operations,
	linux-block@vger.kernel.org, device-mapper development,
	Nikolay Borisov, cgroups@vger.kernel.org, Tejun Heo,
	Chris Friesen

On Wed, Mar 02, 2016 at 09:59:13PM +0200, Nikolay Borisov wrote:
> On Wednesday, March 2, 2016, Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > On Wed, Mar 02, 2016 at 08:03:10PM +0200, Nikolay Borisov wrote:
> > > Thanks for the patch I will likely have time to test this sometime next
> > week.
> > > But just to be sure - the expected behavior would be that processes
> > > writing to dm-based devices would experience the fair-shair
> > > scheduling of CFQ (provided that the physical devices that back those
> > > DM devices use CFQ), correct?
> >
> > Nikolay,
> >
> > I am not sure how well it will work with CFQ of underlying device. It will
> > get cgroup information right for buffered writes. But cgroup information
> 
> 
>  Right, what's your definition of  buffered writes?

Writes which go through page cache.

> My mental model is that
> when a process submits a write request to a dm device , the bio is going to
> be put on a devi e workqueue which would then  be serviced by a background
> worker thread and later the submitter notified. Do you refer to this whole
> gamut of operations as buffered writes?

No, once the bio is submitted to dm device it could be a buffered write or
a direct write.

> 
> for reads and direct writes will come from submitter's context and if dm
> > layer gets in between, then many a times submitter might be a worker
> > thread and IO will be attributed to that worker's cgroup (root cgroup).
> 
> 
> Be that as it may, proivded that the worker thread is in the  'correct'
> cgroup,  then the appropriate babdwidth policies should apply, no?

Worker thread will most likely be in root cgroup. So if a worker thread
is submitting bio, it will be attributed to root cgroup.

We had similar issue with IO priority and it did not work reliably with
CFQ on underlying device when dm devices were sitting on top.

If we really want to give it a try, I guess we will have to put cgroup
info of submitter early in bio at the time of bio creation even for all
kind of IO. Not sure if it is worth the effort.

For the case of IO throttling, I think you should put throttling rules on
the dm device itself. That means as long as filesystem supports the
cgroups, you should be getting right cgroup information for all kind of
IO and throttling should work just fine.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg() (was: Re: blkio cgroups controller doesn't work with LVM?)
  2016-03-02 20:10                           ` Vivek Goyal
@ 2016-03-02 20:19                             ` Nikolay Borisov
       [not found]                               ` <CAJFSNy6MUGr8E3RNw6hFiskcaG4m8EGdqMkQXVh1LGq-yZCjBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
       [not found]                             ` <20160302201016.GE3476-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 22+ messages in thread
From: Nikolay Borisov @ 2016-03-02 20:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Jens Axboe, Mike Snitzer, SiteGround Operations,
	linux-block@vger.kernel.org, device-mapper development,
	Nikolay Borisov, cgroups@vger.kernel.org, Tejun Heo,
	Chris Friesen


[-- Attachment #1.1: Type: text/plain, Size: 2851 bytes --]

On Wednesday, March 2, 2016, Vivek Goyal <vgoyal@redhat.com> wrote:

> On Wed, Mar 02, 2016 at 09:59:13PM +0200, Nikolay Borisov wrote:
> > On Wednesday, March 2, 2016, Vivek Goyal <vgoyal@redhat.com
> <javascript:;>> wrote:
> >
> > > On Wed, Mar 02, 2016 at 08:03:10PM +0200, Nikolay Borisov wrote:
> > > > Thanks for the patch I will likely have time to test this sometime
> next
> > > week.
> > > > But just to be sure - the expected behavior would be that processes
> > > > writing to dm-based devices would experience the fair-shair
> > > > scheduling of CFQ (provided that the physical devices that back those
> > > > DM devices use CFQ), correct?
> > >
> > > Nikolay,
> > >
> > > I am not sure how well it will work with CFQ of underlying device. It
> will
> > > get cgroup information right for buffered writes. But cgroup
> information
> >
> >
> >  Right, what's your definition of  buffered writes?
>
> Writes which go through page cache.
>
> > My mental model is that
> > when a process submits a write request to a dm device , the bio is going
> to
> > be put on a devi e workqueue which would then  be serviced by a
> background
> > worker thread and later the submitter notified. Do you refer to this
> whole
> > gamut of operations as buffered writes?
>
> No, once the bio is submitted to dm device it could be a buffered write or
> a direct write.
>
> >
> > for reads and direct writes will come from submitter's context and if dm
> > > layer gets in between, then many a times submitter might be a worker
> > > thread and IO will be attributed to that worker's cgroup (root cgroup).
> >
> >
> > Be that as it may, proivded that the worker thread is in the  'correct'
> > cgroup,  then the appropriate babdwidth policies should apply, no?
>
> Worker thread will most likely be in root cgroup. So if a worker thread
> is submitting bio, it will be attributed to root cgroup.
>
> We had similar issue with IO priority and it did not work reliably with
> CFQ on underlying device when dm devices were sitting on top.
>
> If we really want to give it a try, I guess we will have to put cgroup
> info of submitter early in bio at the time of bio creation even for all
> kind of IO. Not sure if it is worth the effort.
>
> For the case of IO throttling, I think you should put throttling rules on
> the dm device itself. That means as long as filesystem supports the
> cgroups, you should be getting right cgroup information for all kind of
> IO and throttling should work just fine.


Throttling does work even now,  but the use case I had in mind was
proportional
distribution of IO. Imagine 50  or so dm devices, hosting IO intensive
workloads. In
this situation, I'd  be interested each of them getting proportional IO
based on the weights
set in the blkcg controller for each respective cgroup for every workload.

>
> Thanks
> Vivek
>

[-- Attachment #1.2: Type: text/html, Size: 3611 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg()
       [not found]                             ` <20160302201016.GE3476-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-03-02 20:34                               ` Chris Friesen
       [not found]                                 ` <56D74E6A.9050708-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Friesen @ 2016-03-02 20:34 UTC (permalink / raw)
  To: Vivek Goyal, Nikolay Borisov
  Cc: Nikolay Borisov, Mike Snitzer, Tejun Heo,
	device-mapper development,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jens Axboe,
	SiteGround Operations

On 03/02/2016 02:10 PM, Vivek Goyal wrote:
> On Wed, Mar 02, 2016 at 09:59:13PM +0200, Nikolay Borisov wrote:

> We had similar issue with IO priority and it did not work reliably with
> CFQ on underlying device when dm devices were sitting on top.
>
> If we really want to give it a try, I guess we will have to put cgroup
> info of submitter early in bio at the time of bio creation even for all
> kind of IO. Not sure if it is worth the effort.

As it stands, imagine that you have a hypervisor node running many VMs (or 
containers), each of which is assigned a separate logical volume (possibly 
thin-provisioned) as its rootfs.

Ideally we want the disk accesses by those VMs to be "fair" relative to each 
other, and we want to guarantee a certain amount of bandwidth for the host as well.

Without this sort of feature, how can we accomplish that?

> For the case of IO throttling, I think you should put throttling rules on
> the dm device itself. That means as long as filesystem supports the
> cgroups, you should be getting right cgroup information for all kind of
> IO and throttling should work just fine.

IO throttling isn't all that useful, since it requires you to know in advance 
what your IO rate is.  And it doesn't adjust nicely as the number of competing 
entities changes the way that weight-based schemes do.

Chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg() (was: Re: blkio cgroups controller doesn't work with LVM?)
       [not found]                               ` <CAJFSNy6MUGr8E3RNw6hFiskcaG4m8EGdqMkQXVh1LGq-yZCjBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-02 20:45                                 ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2016-03-02 20:45 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Nikolay Borisov, Mike Snitzer, Tejun Heo, Chris Friesen,
	device-mapper development,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jens Axboe,
	SiteGround Operations

On Wed, Mar 02, 2016 at 10:19:38PM +0200, Nikolay Borisov wrote:
> On Wednesday, March 2, 2016, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Wed, Mar 02, 2016 at 09:59:13PM +0200, Nikolay Borisov wrote:
> > > On Wednesday, March 2, 2016, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > <javascript:;>> wrote:
> > >
> > > > On Wed, Mar 02, 2016 at 08:03:10PM +0200, Nikolay Borisov wrote:
> > > > > Thanks for the patch I will likely have time to test this sometime
> > next
> > > > week.
> > > > > But just to be sure - the expected behavior would be that processes
> > > > > writing to dm-based devices would experience the fair-shair
> > > > > scheduling of CFQ (provided that the physical devices that back those
> > > > > DM devices use CFQ), correct?
> > > >
> > > > Nikolay,
> > > >
> > > > I am not sure how well it will work with CFQ of underlying device. It
> > will
> > > > get cgroup information right for buffered writes. But cgroup
> > information
> > >
> > >
> > >  Right, what's your definition of  buffered writes?
> >
> > Writes which go through page cache.
> >
> > > My mental model is that
> > > when a process submits a write request to a dm device , the bio is going
> > to
> > > be put on a devi e workqueue which would then  be serviced by a
> > background
> > > worker thread and later the submitter notified. Do you refer to this
> > whole
> > > gamut of operations as buffered writes?
> >
> > No, once the bio is submitted to dm device it could be a buffered write or
> > a direct write.
> >
> > >
> > > for reads and direct writes will come from submitter's context and if dm
> > > > layer gets in between, then many a times submitter might be a worker
> > > > thread and IO will be attributed to that worker's cgroup (root cgroup).
> > >
> > >
> > > Be that as it may, proivded that the worker thread is in the  'correct'
> > > cgroup,  then the appropriate babdwidth policies should apply, no?
> >
> > Worker thread will most likely be in root cgroup. So if a worker thread
> > is submitting bio, it will be attributed to root cgroup.
> >
> > We had similar issue with IO priority and it did not work reliably with
> > CFQ on underlying device when dm devices were sitting on top.
> >
> > If we really want to give it a try, I guess we will have to put cgroup
> > info of submitter early in bio at the time of bio creation even for all
> > kind of IO. Not sure if it is worth the effort.
> >
> > For the case of IO throttling, I think you should put throttling rules on
> > the dm device itself. That means as long as filesystem supports the
> > cgroups, you should be getting right cgroup information for all kind of
> > IO and throttling should work just fine.
> 
> 
> Throttling does work even now,  but the use case I had in mind was
> proportional
> distribution of IO. Imagine 50  or so dm devices, hosting IO intensive
> workloads. In
> this situation, I'd  be interested each of them getting proportional IO
> based on the weights
> set in the blkcg controller for each respective cgroup for every workload.
> 

I see what you are trying to do. Carry the cgroup information from top to
bottom of IO stack for all kind of IO.

I guess we also need to call  bio_associate_current() when dm accepts
bio from the submitter.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg()
       [not found]                                 ` <56D74E6A.9050708-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org>
@ 2016-03-02 21:04                                   ` Vivek Goyal
       [not found]                                     ` <20160302210405.GG3476-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Vivek Goyal @ 2016-03-02 21:04 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Nikolay Borisov, Nikolay Borisov, Mike Snitzer, Tejun Heo,
	device-mapper development,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jens Axboe,
	SiteGround Operations

On Wed, Mar 02, 2016 at 02:34:50PM -0600, Chris Friesen wrote:
> On 03/02/2016 02:10 PM, Vivek Goyal wrote:
> >On Wed, Mar 02, 2016 at 09:59:13PM +0200, Nikolay Borisov wrote:
> 
> >We had similar issue with IO priority and it did not work reliably with
> >CFQ on underlying device when dm devices were sitting on top.
> >
> >If we really want to give it a try, I guess we will have to put cgroup
> >info of submitter early in bio at the time of bio creation even for all
> >kind of IO. Not sure if it is worth the effort.
> 
> As it stands, imagine that you have a hypervisor node running many VMs (or
> containers), each of which is assigned a separate logical volume (possibly
> thin-provisioned) as its rootfs.
> 
> Ideally we want the disk accesses by those VMs to be "fair" relative to each
> other, and we want to guarantee a certain amount of bandwidth for the host
> as well.
> 
> Without this sort of feature, how can we accomplish that?

As of now, you can't. I will try adding bio_associate_current() and see
if that along with Mike's patches gets you what you are looking for.

On a side note, have you tried using CFQ's proportional logic with multile
VMs. Say partition the disk and pass each parition to VM/container and
do the IO. My main concern is that by default each cgroup can add
significant idling overhead and kill overall throughput of disk
(especially for random IO or if cgroup does not have enough IO to keep
disk busy).

One can disable group idling but that kills service differentiation for
most of the workloads.

So I was curious to know if CFQ's proportional bandwidth division is
helping you in real life. (without dm of course).

> 
> >For the case of IO throttling, I think you should put throttling rules on
> >the dm device itself. That means as long as filesystem supports the
> >cgroups, you should be getting right cgroup information for all kind of
> >IO and throttling should work just fine.
> 
> IO throttling isn't all that useful, since it requires you to know in
> advance what your IO rate is.  And it doesn't adjust nicely as the number of
> competing entities changes the way that weight-based schemes do.

Agreed that absolute limits are less useful as compared to dynamic limits
provided by weights. This is more useful for scenario where a cloud
provider does not want to provide disk bandwidth if user has not paid for
it (even if disk bandwidth is available).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg()
       [not found]                                     ` <20160302210405.GG3476-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-03-02 21:19                                       ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2016-03-02 21:19 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Nikolay Borisov, Nikolay Borisov, Mike Snitzer, Tejun Heo,
	device-mapper development,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jens Axboe,
	SiteGround Operations

On Wed, Mar 02, 2016 at 04:04:05PM -0500, Vivek Goyal wrote:
> On Wed, Mar 02, 2016 at 02:34:50PM -0600, Chris Friesen wrote:
> > On 03/02/2016 02:10 PM, Vivek Goyal wrote:
> > >On Wed, Mar 02, 2016 at 09:59:13PM +0200, Nikolay Borisov wrote:
> > 
> > >We had similar issue with IO priority and it did not work reliably with
> > >CFQ on underlying device when dm devices were sitting on top.
> > >
> > >If we really want to give it a try, I guess we will have to put cgroup
> > >info of submitter early in bio at the time of bio creation even for all
> > >kind of IO. Not sure if it is worth the effort.
> > 
> > As it stands, imagine that you have a hypervisor node running many VMs (or
> > containers), each of which is assigned a separate logical volume (possibly
> > thin-provisioned) as its rootfs.
> > 
> > Ideally we want the disk accesses by those VMs to be "fair" relative to each
> > other, and we want to guarantee a certain amount of bandwidth for the host
> > as well.
> > 
> > Without this sort of feature, how can we accomplish that?
> 
> As of now, you can't. I will try adding bio_associate_current() and see
> if that along with Mike's patches gets you what you are looking for.
> 

Can you try following also along with Mike's patch of carrying cgroup
info over the clones.

Mike, is it right place in dm layer to hook into. I think this will take
care of bio based targets.

Even after this I think there are still two issues.

- bio_associate_current() assumes that submitter already has an io context
  otherwise does nothing. So in this case if container/VM process does not
  have io context, nothing will happen.

- We will also need mechanism to carry io context information when we 
  clone bio. Otherwise we will get the cgroup of the original process
  and io context of the dm thread (kind of odd).


---
 drivers/md/dm.c |    1 +
 1 file changed, 1 insertion(+)

Index: rhvgoyal-linux/drivers/md/dm.c
===================================================================
--- rhvgoyal-linux.orig/drivers/md/dm.c	2016-03-02 19:19:12.301000000 +0000
+++ rhvgoyal-linux/drivers/md/dm.c	2016-03-02 21:11:01.357000000 +0000
@@ -1769,6 +1769,7 @@ static blk_qc_t dm_make_request(struct r
 
 	generic_start_io_acct(rw, bio_sectors(bio), &dm_disk(md)->part0);
 
+	bio_associate_current(bio);
 	/* if we're suspended, we have to queue this io for later */
 	if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) {
 		dm_put_live_table(md, srcu_idx);
Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg()
       [not found]                 ` <20160302175656.GA59991-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-03-02 18:03                   ` Nikolay Borisov
@ 2016-03-14 15:08                   ` Nikolay Borisov
       [not found]                     ` <56E6D3D3.4070104-6AxghH7DbtA@public.gmane.org>
  1 sibling, 1 reply; 22+ messages in thread
From: Nikolay Borisov @ 2016-03-14 15:08 UTC (permalink / raw)
  To: Mike Snitzer, Tejun Heo
  Cc: Chris Friesen, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal,
	linux-block-u79uwXL29TY76Z2rM5mHXA, axboe-tSWWG44O7X1aa/9Udqfwiw



On 03/02/2016 07:56 PM, Mike Snitzer wrote:
> On Wed, Mar 02 2016 at 11:06P -0500,
> Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> 
>> Hello,
>>
>> On Thu, Feb 25, 2016 at 09:53:14AM -0500, Mike Snitzer wrote:
>>> Right, LVM created devices are bio-based DM devices in the kernel.
>>> bio-based block devices do _not_ have an IO scheduler.  Their underlying
>>> request-based device does.
>>
>> dm devices are not the actual resource source, so I don't think it'd
>> work too well to put io controllers on them (can't really do things
>> like proportional control without owning the queue).
>>
>>> I'm not well-versed on the top-level cgroup interface and how it maps to
>>> associated resources that are established in the kernel.  But it could
>>> be that the configuration of blkio cgroup against a bio-based LVM device
>>> needs to be passed through to the underlying request-based device
>>> (e.g. /dev/sda4 in Chris's case)?
>>>
>>> I'm also wondering whether the latest cgroup work that Tejun has just
>>> finished (afaik to support buffered IO in the IO controller) will afford
>>> us a more meaningful reason to work to make cgroups' blkio controller
>>> actually work with bio-based devices like LVM's DM devices?
>>>
>>> I'm very much open to advice on how to proceed with investigating this
>>> integration work.  Tejun, Vivek, anyone else: if you have advice on next
>>> steps for DM on this front _please_ yell, thanks!
>>
>> I think the only thing necessary is dm transferring bio cgroup tags to
>> the bio's that it ends up passing down the stack.  Please take a look
>> at fs/btrfs/extent_io.c::btrfs_bio_clone() for an example.  We
>> probably should introduce a wrapper for this so that each site doesn't
>> need to ifdef it.
>>
>> Thanks.
> 
> OK, I think this should do it.  Nikolay and/or others can you test this
> patch using blkio cgroups controller with LVM devices and report back?
> 
> From: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Date: Wed, 2 Mar 2016 12:37:39 -0500
> Subject: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg()
> 
> Move btrfs_bio_clone()'s support for transferring a source bio's cgroup
> tags to a clone into both bio_clone_bioset() and __bio_clone_fast().
> The former is used by btrfs (MD and blk-core also use it via bio_split).
> The latter is used by both DM and bcache.
> 
> This should enable the blkio cgroups controller to work with all
> stacking bio-based block devices.
> 
> Reported-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> Suggested-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Signed-off-by: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  block/bio.c          | 10 ++++++++++
>  fs/btrfs/extent_io.c |  6 ------
>  2 files changed, 10 insertions(+), 6 deletions(-)


So I had a chance to test the settings here is what I got when running 
2 container, using LVM-thin for their root device and having applied 
your patch: 

When the 2 containers are using the same blkio.weight values (500) I 
get the following from running DD simultaneously on the 2 containers: 

[root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=3000 oflag=direct
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 165.171 s, 19.0 MB/s

[root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000 oflag=direct
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 166.165 s, 18.9 MB/s

Also iostat showed the 2 volumes using almost the same amount of 
IO (around 20mb r/w). I then increase the weight for c1501 to 1000 i.e. 
twice the bandwidth that c1500 has, so I would expect its dd to complete
twice as fast: 

[root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=3000 oflag=direct
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 150.892 s, 20.8 MB/s


[root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000 oflag=direct
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 157.167 s, 20.0 MB/s

Now repeating the same tests but this time using the page-cache 
(echo 3 > /proc/sys/vm/drop_caches) was executed before each test run: 

With equal weights (500):
[root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=3000
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 114.923 s, 27.4 MB/s

[root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 120.245 s, 26.2 MB/s

With (c1501's weight equal to twice that of c1500 (1000)):

[root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=3000
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 99.0181 s, 31.8 MB/s

[root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 122.872 s, 25.6 MB/s

I'd say that for buffered IO your patch does indeed make a difference, 
and this sort of aligns with what Vivek said about the patch
working for buffered writes but not for direct. 

I will proceed now and test his patch applied for the case of 
direct writes. 

Hope this helps. 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg()
       [not found]                     ` <56E6D3D3.4070104-6AxghH7DbtA@public.gmane.org>
@ 2016-03-14 15:31                       ` Nikolay Borisov
       [not found]                         ` <56E6D95B.3030904-/eCPMmvKun9pLGFMi4vTTA@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Nikolay Borisov @ 2016-03-14 15:31 UTC (permalink / raw)
  To: Mike Snitzer, Tejun Heo
  Cc: Chris Friesen, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal,
	linux-block-u79uwXL29TY76Z2rM5mHXA, axboe-tSWWG44O7X1aa/9Udqfwiw



On 03/14/2016 05:08 PM, Nikolay Borisov wrote:
> 
> 
> On 03/02/2016 07:56 PM, Mike Snitzer wrote:
>> On Wed, Mar 02 2016 at 11:06P -0500,
>> Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>
>>> Hello,
>>>
>>> On Thu, Feb 25, 2016 at 09:53:14AM -0500, Mike Snitzer wrote:
>>>> Right, LVM created devices are bio-based DM devices in the kernel.
>>>> bio-based block devices do _not_ have an IO scheduler.  Their underlying
>>>> request-based device does.
>>>
>>> dm devices are not the actual resource source, so I don't think it'd
>>> work too well to put io controllers on them (can't really do things
>>> like proportional control without owning the queue).
>>>
>>>> I'm not well-versed on the top-level cgroup interface and how it maps to
>>>> associated resources that are established in the kernel.  But it could
>>>> be that the configuration of blkio cgroup against a bio-based LVM device
>>>> needs to be passed through to the underlying request-based device
>>>> (e.g. /dev/sda4 in Chris's case)?
>>>>
>>>> I'm also wondering whether the latest cgroup work that Tejun has just
>>>> finished (afaik to support buffered IO in the IO controller) will afford
>>>> us a more meaningful reason to work to make cgroups' blkio controller
>>>> actually work with bio-based devices like LVM's DM devices?
>>>>
>>>> I'm very much open to advice on how to proceed with investigating this
>>>> integration work.  Tejun, Vivek, anyone else: if you have advice on next
>>>> steps for DM on this front _please_ yell, thanks!
>>>
>>> I think the only thing necessary is dm transferring bio cgroup tags to
>>> the bio's that it ends up passing down the stack.  Please take a look
>>> at fs/btrfs/extent_io.c::btrfs_bio_clone() for an example.  We
>>> probably should introduce a wrapper for this so that each site doesn't
>>> need to ifdef it.
>>>
>>> Thanks.
>>
>> OK, I think this should do it.  Nikolay and/or others can you test this
>> patch using blkio cgroups controller with LVM devices and report back?
>>
>> From: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> Date: Wed, 2 Mar 2016 12:37:39 -0500
>> Subject: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg()
>>
>> Move btrfs_bio_clone()'s support for transferring a source bio's cgroup
>> tags to a clone into both bio_clone_bioset() and __bio_clone_fast().
>> The former is used by btrfs (MD and blk-core also use it via bio_split).
>> The latter is used by both DM and bcache.
>>
>> This should enable the blkio cgroups controller to work with all
>> stacking bio-based block devices.
>>
>> Reported-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
>> Suggested-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>> Signed-off-by: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> ---
>>  block/bio.c          | 10 ++++++++++
>>  fs/btrfs/extent_io.c |  6 ------
>>  2 files changed, 10 insertions(+), 6 deletions(-)
> 
> 
> So I had a chance to test the settings here is what I got when running 
> 2 container, using LVM-thin for their root device and having applied 
> your patch: 
> 
> When the 2 containers are using the same blkio.weight values (500) I 
> get the following from running DD simultaneously on the 2 containers: 
> 
> [root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=3000 oflag=direct
> 3000+0 records in
> 3000+0 records out
> 3145728000 bytes (3.1 GB) copied, 165.171 s, 19.0 MB/s
> 
> [root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000 oflag=direct
> 3000+0 records in
> 3000+0 records out
> 3145728000 bytes (3.1 GB) copied, 166.165 s, 18.9 MB/s
> 
> Also iostat showed the 2 volumes using almost the same amount of 
> IO (around 20mb r/w). I then increase the weight for c1501 to 1000 i.e. 
> twice the bandwidth that c1500 has, so I would expect its dd to complete
> twice as fast: 
> 
> [root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=3000 oflag=direct
> 3000+0 records in
> 3000+0 records out
> 3145728000 bytes (3.1 GB) copied, 150.892 s, 20.8 MB/s
> 
> 
> [root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000 oflag=direct
> 3000+0 records in
> 3000+0 records out
> 3145728000 bytes (3.1 GB) copied, 157.167 s, 20.0 MB/s
> 
> Now repeating the same tests but this time using the page-cache 
> (echo 3 > /proc/sys/vm/drop_caches) was executed before each test run: 
> 
> With equal weights (500):
> [root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=3000
> 3000+0 records in
> 3000+0 records out
> 3145728000 bytes (3.1 GB) copied, 114.923 s, 27.4 MB/s
> 
> [root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000
> 3000+0 records in
> 3000+0 records out
> 3145728000 bytes (3.1 GB) copied, 120.245 s, 26.2 MB/s
> 
> With (c1501's weight equal to twice that of c1500 (1000)):
> 
> [root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=3000
> 3000+0 records in
> 3000+0 records out
> 3145728000 bytes (3.1 GB) copied, 99.0181 s, 31.8 MB/s
> 
> [root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000
> 3000+0 records in
> 3000+0 records out
> 3145728000 bytes (3.1 GB) copied, 122.872 s, 25.6 MB/s

And another test which makes it obvious that your patch works:

[root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=6000
6000+0 records in
6000+0 records out
6291456000 bytes (6.3 GB) copied, 210.466 s, 29.9 MB/s

[root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB) copied, 201.118 s, 15.6 MB/s


So a file that is twice the size of another one (6vs3 g) is copied for
almost the same amount of time with 2x the bandwidth.


> 
> I'd say that for buffered IO your patch does indeed make a difference, 
> and this sort of aligns with what Vivek said about the patch
> working for buffered writes but not for direct. 
> 
> I will proceed now and test his patch applied for the case of 
> direct writes. 
> 
> Hope this helps. 
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: block: transfer source bio's cgroup tags to clone via bio_associate_blkcg()
       [not found]                         ` <56E6D95B.3030904-/eCPMmvKun9pLGFMi4vTTA@public.gmane.org>
@ 2016-03-14 19:49                           ` Mike Snitzer
       [not found]                             ` <20160314194912.GA6975-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Mike Snitzer @ 2016-03-14 19:49 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Tejun Heo, axboe-tSWWG44O7X1aa/9Udqfwiw,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Chris Friesen, Vivek Goyal

On Mon, Mar 14 2016 at 11:31am -0400,
Nikolay Borisov <n.borisov-/eCPMmvKun9pLGFMi4vTTA@public.gmane.org> wrote:

> 
> 
> On 03/14/2016 05:08 PM, Nikolay Borisov wrote:
> > 
> > 
> > On 03/02/2016 07:56 PM, Mike Snitzer wrote:
> >> On Wed, Mar 02 2016 at 11:06P -0500,
> >> Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> >>
> >>> Hello,
> >>>
> >>> On Thu, Feb 25, 2016 at 09:53:14AM -0500, Mike Snitzer wrote:
> >>>> Right, LVM created devices are bio-based DM devices in the kernel.
> >>>> bio-based block devices do _not_ have an IO scheduler.  Their underlying
> >>>> request-based device does.
> >>>
> >>> dm devices are not the actual resource source, so I don't think it'd
> >>> work too well to put io controllers on them (can't really do things
> >>> like proportional control without owning the queue).
> >>>
> >>>> I'm not well-versed on the top-level cgroup interface and how it maps to
> >>>> associated resources that are established in the kernel.  But it could
> >>>> be that the configuration of blkio cgroup against a bio-based LVM device
> >>>> needs to be passed through to the underlying request-based device
> >>>> (e.g. /dev/sda4 in Chris's case)?
> >>>>
> >>>> I'm also wondering whether the latest cgroup work that Tejun has just
> >>>> finished (afaik to support buffered IO in the IO controller) will afford
> >>>> us a more meaningful reason to work to make cgroups' blkio controller
> >>>> actually work with bio-based devices like LVM's DM devices?
> >>>>
> >>>> I'm very much open to advice on how to proceed with investigating this
> >>>> integration work.  Tejun, Vivek, anyone else: if you have advice on next
> >>>> steps for DM on this front _please_ yell, thanks!
> >>>
> >>> I think the only thing necessary is dm transferring bio cgroup tags to
> >>> the bio's that it ends up passing down the stack.  Please take a look
> >>> at fs/btrfs/extent_io.c::btrfs_bio_clone() for an example.  We
> >>> probably should introduce a wrapper for this so that each site doesn't
> >>> need to ifdef it.
> >>>
> >>> Thanks.
> >>
> >> OK, I think this should do it.  Nikolay and/or others can you test this
> >> patch using blkio cgroups controller with LVM devices and report back?
> >>
> >> From: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >> Date: Wed, 2 Mar 2016 12:37:39 -0500
> >> Subject: [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg()
> >>
> >> Move btrfs_bio_clone()'s support for transferring a source bio's cgroup
> >> tags to a clone into both bio_clone_bioset() and __bio_clone_fast().
> >> The former is used by btrfs (MD and blk-core also use it via bio_split).
> >> The latter is used by both DM and bcache.
> >>
> >> This should enable the blkio cgroups controller to work with all
> >> stacking bio-based block devices.
> >>
> >> Reported-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> >> Suggested-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> >> Signed-off-by: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >> ---
> >>  block/bio.c          | 10 ++++++++++
> >>  fs/btrfs/extent_io.c |  6 ------
> >>  2 files changed, 10 insertions(+), 6 deletions(-)
> > 
> > 
> > So I had a chance to test the settings here is what I got when running 
> > 2 container, using LVM-thin for their root device and having applied 
> > your patch: 
> > 
> > When the 2 containers are using the same blkio.weight values (500) I 
> > get the following from running DD simultaneously on the 2 containers: 
> > 
> > [root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=3000 oflag=direct
> > 3000+0 records in
> > 3000+0 records out
> > 3145728000 bytes (3.1 GB) copied, 165.171 s, 19.0 MB/s
> > 
> > [root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000 oflag=direct
> > 3000+0 records in
> > 3000+0 records out
> > 3145728000 bytes (3.1 GB) copied, 166.165 s, 18.9 MB/s
> > 
> > Also iostat showed the 2 volumes using almost the same amount of 
> > IO (around 20mb r/w). I then increase the weight for c1501 to 1000 i.e. 
> > twice the bandwidth that c1500 has, so I would expect its dd to complete
> > twice as fast: 
> > 
> > [root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=3000 oflag=direct
> > 3000+0 records in
> > 3000+0 records out
> > 3145728000 bytes (3.1 GB) copied, 150.892 s, 20.8 MB/s
> > 
> > 
> > [root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000 oflag=direct
> > 3000+0 records in
> > 3000+0 records out
> > 3145728000 bytes (3.1 GB) copied, 157.167 s, 20.0 MB/s
> > 
> > Now repeating the same tests but this time using the page-cache 
> > (echo 3 > /proc/sys/vm/drop_caches) was executed before each test run: 
> > 
> > With equal weights (500):
> > [root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=3000
> > 3000+0 records in
> > 3000+0 records out
> > 3145728000 bytes (3.1 GB) copied, 114.923 s, 27.4 MB/s
> > 
> > [root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000
> > 3000+0 records in
> > 3000+0 records out
> > 3145728000 bytes (3.1 GB) copied, 120.245 s, 26.2 MB/s
> > 
> > With (c1501's weight equal to twice that of c1500 (1000)):
> > 
> > [root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=3000
> > 3000+0 records in
> > 3000+0 records out
> > 3145728000 bytes (3.1 GB) copied, 99.0181 s, 31.8 MB/s
> > 
> > [root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000
> > 3000+0 records in
> > 3000+0 records out
> > 3145728000 bytes (3.1 GB) copied, 122.872 s, 25.6 MB/s
> 
> And another test which makes it obvious that your patch works:
> 
> [root@c1501 ~]# dd if=test.img of=test2.img bs=1M count=6000
> 6000+0 records in
> 6000+0 records out
> 6291456000 bytes (6.3 GB) copied, 210.466 s, 29.9 MB/s
> 
> [root@c1500 ~]# dd if=test.img of=test2.img bs=1M count=3000
> 3000+0 records in
> 3000+0 records out
> 3145728000 bytes (3.1 GB) copied, 201.118 s, 15.6 MB/s
> 
> 
> So a file that is twice the size of another one (6vs3 g) is copied for
> almost the same amount of time with 2x the bandwidth.

Great.

Jens, can you pick up the patch in question ("[PATCH] block: transfer
source bio's cgroup tags to clone via bio_associate_blkcg()") that I
posted in this thread?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: block: transfer source bio's cgroup tags to clone via bio_associate_blkcg()
       [not found]                             ` <20160314194912.GA6975-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-03-14 22:08                               ` Nikolay Borisov
  0 siblings, 0 replies; 22+ messages in thread
From: Nikolay Borisov @ 2016-03-14 22:08 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Tejun Heo, Jens Axboe, linux-block-u79uwXL29TY76Z2rM5mHXA,
	device-mapper development, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Chris Friesen, Vivek Goyal

>
> Great.
>
> Jens, can you pick up the patch in question ("[PATCH] block: transfer
> source bio's cgroup tags to clone via bio_associate_blkcg()") that I
> posted in this thread?

And what about Vivek's patch of associating the source bio with the io
context of the process issuing the io?
Would that help in the DIO case?

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-03-14 22:08 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-24 18:12 blkio cgroups controller doesn't work with LVM? Chris Friesen
     [not found] ` <56CDF283.9010802-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org>
2016-02-25  7:48   ` Nikolay Borisov
     [not found]     ` <56CEB1BC.4000005-6AxghH7DbtA@public.gmane.org>
2016-02-25 14:53       ` Mike Snitzer
     [not found]         ` <20160225145314.GA20699-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-02-25 15:15           ` Chris Friesen
2016-02-26 16:42           ` [dm-devel] " Vivek Goyal
     [not found]             ` <20160226164228.GA24711-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-02-26 16:45               ` Vivek Goyal
2016-03-02 16:06           ` Tejun Heo
     [not found]             ` <20160302160649.GB29826-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2016-03-02 17:56               ` [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg() (was: Re: blkio cgroups controller doesn't work with LVM?) Mike Snitzer
     [not found]                 ` <20160302175656.GA59991-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-03-02 18:03                   ` Nikolay Borisov
     [not found]                     ` <CAJFSNy6hni1-NWDs0z=Hq223=DfcjsNPoAb7GRAGEPCUXh4Q9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-02 18:05                       ` Mike Snitzer
2016-03-02 19:18                       ` [PATCH] " Vivek Goyal
2016-03-02 19:59                         ` Nikolay Borisov
2016-03-02 20:10                           ` Vivek Goyal
2016-03-02 20:19                             ` Nikolay Borisov
     [not found]                               ` <CAJFSNy6MUGr8E3RNw6hFiskcaG4m8EGdqMkQXVh1LGq-yZCjBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-02 20:45                                 ` Vivek Goyal
     [not found]                             ` <20160302201016.GE3476-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-03-02 20:34                               ` [PATCH] block: transfer source bio's cgroup tags to clone via bio_associate_blkcg() Chris Friesen
     [not found]                                 ` <56D74E6A.9050708-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org>
2016-03-02 21:04                                   ` Vivek Goyal
     [not found]                                     ` <20160302210405.GG3476-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-03-02 21:19                                       ` Vivek Goyal
2016-03-14 15:08                   ` Nikolay Borisov
     [not found]                     ` <56E6D3D3.4070104-6AxghH7DbtA@public.gmane.org>
2016-03-14 15:31                       ` Nikolay Borisov
     [not found]                         ` <56E6D95B.3030904-/eCPMmvKun9pLGFMi4vTTA@public.gmane.org>
2016-03-14 19:49                           ` Mike Snitzer
     [not found]                             ` <20160314194912.GA6975-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-03-14 22:08                               ` Nikolay Borisov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).