* dm-ioband + bio-cgroup benchmarks
@ 2008-09-18 12:04 Ryo Tsuruta
2008-09-18 13:15 ` Vivek Goyal
` (4 more replies)
0 siblings, 5 replies; 140+ messages in thread
From: Ryo Tsuruta @ 2008-09-18 12:04 UTC (permalink / raw)
To: linux-kernel, dm-devel, containers, virtualization, xen-devel
Cc: agk, balbir, xemul, kamezawa.hiroyu, fernando
Hi All,
I have got excellent results of dm-ioband, that controls the disk I/O
bandwidth even when it accepts delayed write requests.
In this time, I ran some benchmarks with a high-end storage. The
reason was to avoid a performance bottleneck due to mechanical factors
such as seek time.
You can see the details of the benchmarks at:
http://people.valinux.co.jp/~ryov/dm-ioband/hps/
Thanks,
Ryo Tsuruta
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 12:04 dm-ioband + bio-cgroup benchmarks Ryo Tsuruta @ 2008-09-18 13:15 ` Vivek Goyal 2008-09-18 13:15 ` Vivek Goyal ` (3 subsequent siblings) 4 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-18 13:15 UTC (permalink / raw) To: Ryo Tsuruta Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, Andrea Righi, agk, xemul, fernando, balbir On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > Hi All, > > I have got excellent results of dm-ioband, that controls the disk I/O > bandwidth even when it accepts delayed write requests. > > In this time, I ran some benchmarks with a high-end storage. The > reason was to avoid a performance bottleneck due to mechanical factors > such as seek time. > > You can see the details of the benchmarks at: > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > Hi Ryo, I had a query about dm-ioband patches. IIUC, dm-ioband patches will break the notion of process priority in CFQ because now dm-ioband device will hold the bio and issue these to lower layers later based on which bio's become ready. Hence actual bio submitting context might be different and because cfq derives the io_context from current task, it will be broken. To mitigate that problem, we probably need to implement Fernando's suggestion of putting io_context pointer in bio. Have you already done something to solve this issue? Secondly, why do we have to create an additional dm-ioband device for every device we want to control using rules. This looks little odd atleast to me. Can't we keep it in line with rest of the controllers where task grouping takes place using cgroup and rules are specified in cgroup itself (The way Andrea Righi does for io-throttling patches)? To avoid creation of stacking another device (dm-ioband) on top of every device we want to subject to rules, I was thinking of maintaining an rb-tree per request queue. Requests will first go into this rb-tree upon __make_request() and then will filter down to elevator associated with the queue (if there is one). This will provide us the control of releasing bio's to elevaor based on policies (proportional weight, max bandwidth etc) and no need of stacking additional block device. I am working on some experimental proof of concept patches. It will take some time though. I was thinking of following. - Adopt the Andrea Righi's style of specifying rules for devices and group the tasks using cgroups. - To begin with, adopt dm-ioband's approach of proportional bandwidth controller. It makes sense to me limit the bandwidth usage only in case of contention. If there is really a need to limit max bandwidth, then probably we can do something to implement additional rules or implement some policy switcher where user can decide what kind of policies need to be implemented. - Get rid of dm-ioband and instead buffer requests on an rb-tree on every request queue which is controlled by some kind of cgroup rules. It would be good to discuss above approach now whether it makes sense or not. I think it is kind of fusion of io-throttling and dm-ioband patches with additional idea of doing io-control just above elevator on the request queue using an rb-tree. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 12:04 dm-ioband + bio-cgroup benchmarks Ryo Tsuruta 2008-09-18 13:15 ` Vivek Goyal @ 2008-09-18 13:15 ` Vivek Goyal 2008-09-18 14:37 ` Andrea Righi ` (6 more replies) [not found] ` <20080918.210418.226794540.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> ` (2 subsequent siblings) 4 siblings, 7 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-18 13:15 UTC (permalink / raw) To: Ryo Tsuruta Cc: linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, Andrea Righi, jens.axboe On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > Hi All, > > I have got excellent results of dm-ioband, that controls the disk I/O > bandwidth even when it accepts delayed write requests. > > In this time, I ran some benchmarks with a high-end storage. The > reason was to avoid a performance bottleneck due to mechanical factors > such as seek time. > > You can see the details of the benchmarks at: > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > Hi Ryo, I had a query about dm-ioband patches. IIUC, dm-ioband patches will break the notion of process priority in CFQ because now dm-ioband device will hold the bio and issue these to lower layers later based on which bio's become ready. Hence actual bio submitting context might be different and because cfq derives the io_context from current task, it will be broken. To mitigate that problem, we probably need to implement Fernando's suggestion of putting io_context pointer in bio. Have you already done something to solve this issue? Secondly, why do we have to create an additional dm-ioband device for every device we want to control using rules. This looks little odd atleast to me. Can't we keep it in line with rest of the controllers where task grouping takes place using cgroup and rules are specified in cgroup itself (The way Andrea Righi does for io-throttling patches)? To avoid creation of stacking another device (dm-ioband) on top of every device we want to subject to rules, I was thinking of maintaining an rb-tree per request queue. Requests will first go into this rb-tree upon __make_request() and then will filter down to elevator associated with the queue (if there is one). This will provide us the control of releasing bio's to elevaor based on policies (proportional weight, max bandwidth etc) and no need of stacking additional block device. I am working on some experimental proof of concept patches. It will take some time though. I was thinking of following. - Adopt the Andrea Righi's style of specifying rules for devices and group the tasks using cgroups. - To begin with, adopt dm-ioband's approach of proportional bandwidth controller. It makes sense to me limit the bandwidth usage only in case of contention. If there is really a need to limit max bandwidth, then probably we can do something to implement additional rules or implement some policy switcher where user can decide what kind of policies need to be implemented. - Get rid of dm-ioband and instead buffer requests on an rb-tree on every request queue which is controlled by some kind of cgroup rules. It would be good to discuss above approach now whether it makes sense or not. I think it is kind of fusion of io-throttling and dm-ioband patches with additional idea of doing io-control just above elevator on the request queue using an rb-tree. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 13:15 ` Vivek Goyal @ 2008-09-18 14:37 ` Andrea Righi 2008-09-18 15:06 ` Vivek Goyal [not found] ` <48D267B5.20402-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2008-09-18 14:37 ` Andrea Righi ` (5 subsequent siblings) 6 siblings, 2 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-18 14:37 UTC (permalink / raw) To: Vivek Goyal Cc: Ryo Tsuruta, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe Vivek Goyal wrote: > On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >> Hi All, >> >> I have got excellent results of dm-ioband, that controls the disk I/O >> bandwidth even when it accepts delayed write requests. >> >> In this time, I ran some benchmarks with a high-end storage. The >> reason was to avoid a performance bottleneck due to mechanical factors >> such as seek time. >> >> You can see the details of the benchmarks at: >> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >> > > Hi Ryo, > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > the notion of process priority in CFQ because now dm-ioband device will > hold the bio and issue these to lower layers later based on which bio's > become ready. Hence actual bio submitting context might be different and > because cfq derives the io_context from current task, it will be broken. > > To mitigate that problem, we probably need to implement Fernando's > suggestion of putting io_context pointer in bio. > > Have you already done something to solve this issue? > > Secondly, why do we have to create an additional dm-ioband device for > every device we want to control using rules. This looks little odd > atleast to me. Can't we keep it in line with rest of the controllers > where task grouping takes place using cgroup and rules are specified in > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > To avoid creation of stacking another device (dm-ioband) on top of every > device we want to subject to rules, I was thinking of maintaining an > rb-tree per request queue. Requests will first go into this rb-tree upon > __make_request() and then will filter down to elevator associated with the > queue (if there is one). This will provide us the control of releasing > bio's to elevaor based on policies (proportional weight, max bandwidth > etc) and no need of stacking additional block device. > > I am working on some experimental proof of concept patches. It will take > some time though. > > I was thinking of following. > > - Adopt the Andrea Righi's style of specifying rules for devices and > group the tasks using cgroups. > > - To begin with, adopt dm-ioband's approach of proportional bandwidth > controller. It makes sense to me limit the bandwidth usage only in > case of contention. If there is really a need to limit max bandwidth, > then probably we can do something to implement additional rules or > implement some policy switcher where user can decide what kind of > policies need to be implemented. > > - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > request queue which is controlled by some kind of cgroup rules. > > It would be good to discuss above approach now whether it makes sense or > not. I think it is kind of fusion of io-throttling and dm-ioband patches > with additional idea of doing io-control just above elevator on the request > queue using an rb-tree. Thanks Vivek. All sounds reasonable to me and I think this is be the right way to proceed. I'll try to design and implement your rb-tree per request-queue idea into my io-throttle controller, maybe we can reuse it also for a more generic solution. Feel free to send me your experimental proof of concept if you want, even if it's not yet complete, I can review it, test and contribute. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 14:37 ` Andrea Righi @ 2008-09-18 15:06 ` Vivek Goyal [not found] ` <48D267B5.20402-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 1 sibling, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-18 15:06 UTC (permalink / raw) To: Andrea Righi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, balbir On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: > Vivek Goyal wrote: > > On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > >> Hi All, > >> > >> I have got excellent results of dm-ioband, that controls the disk I/O > >> bandwidth even when it accepts delayed write requests. > >> > >> In this time, I ran some benchmarks with a high-end storage. The > >> reason was to avoid a performance bottleneck due to mechanical factors > >> such as seek time. > >> > >> You can see the details of the benchmarks at: > >> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > >> > > > > Hi Ryo, > > > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > > the notion of process priority in CFQ because now dm-ioband device will > > hold the bio and issue these to lower layers later based on which bio's > > become ready. Hence actual bio submitting context might be different and > > because cfq derives the io_context from current task, it will be broken. > > > > To mitigate that problem, we probably need to implement Fernando's > > suggestion of putting io_context pointer in bio. > > > > Have you already done something to solve this issue? > > > > Secondly, why do we have to create an additional dm-ioband device for > > every device we want to control using rules. This looks little odd > > atleast to me. Can't we keep it in line with rest of the controllers > > where task grouping takes place using cgroup and rules are specified in > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > device we want to subject to rules, I was thinking of maintaining an > > rb-tree per request queue. Requests will first go into this rb-tree upon > > __make_request() and then will filter down to elevator associated with the > > queue (if there is one). This will provide us the control of releasing > > bio's to elevaor based on policies (proportional weight, max bandwidth > > etc) and no need of stacking additional block device. > > > > I am working on some experimental proof of concept patches. It will take > > some time though. > > > > I was thinking of following. > > > > - Adopt the Andrea Righi's style of specifying rules for devices and > > group the tasks using cgroups. > > > > - To begin with, adopt dm-ioband's approach of proportional bandwidth > > controller. It makes sense to me limit the bandwidth usage only in > > case of contention. If there is really a need to limit max bandwidth, > > then probably we can do something to implement additional rules or > > implement some policy switcher where user can decide what kind of > > policies need to be implemented. > > > > - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > > request queue which is controlled by some kind of cgroup rules. > > > > It would be good to discuss above approach now whether it makes sense or > > not. I think it is kind of fusion of io-throttling and dm-ioband patches > > with additional idea of doing io-control just above elevator on the request > > queue using an rb-tree. > > Thanks Vivek. All sounds reasonable to me and I think this is be the right way > to proceed. > > I'll try to design and implement your rb-tree per request-queue idea into my > io-throttle controller, maybe we can reuse it also for a more generic solution. > Feel free to send me your experimental proof of concept if you want, even if > it's not yet complete, I can review it, test and contribute. Currently I have taken code from bio-cgroup to implement cgroups and to provide functionality to associate a bio to a cgroup. I need this to be able to queue the bio's at right node in the rb-tree and then also to be able to take a decision when is the right time to release few requests. Right now in crude implementation, I am working on making system boot. Once patches are at least in little bit working shape, I will send it to you to have a look. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-18 15:06 ` Vivek Goyal 0 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-18 15:06 UTC (permalink / raw) To: Andrea Righi Cc: Ryo Tsuruta, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: > Vivek Goyal wrote: > > On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > >> Hi All, > >> > >> I have got excellent results of dm-ioband, that controls the disk I/O > >> bandwidth even when it accepts delayed write requests. > >> > >> In this time, I ran some benchmarks with a high-end storage. The > >> reason was to avoid a performance bottleneck due to mechanical factors > >> such as seek time. > >> > >> You can see the details of the benchmarks at: > >> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > >> > > > > Hi Ryo, > > > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > > the notion of process priority in CFQ because now dm-ioband device will > > hold the bio and issue these to lower layers later based on which bio's > > become ready. Hence actual bio submitting context might be different and > > because cfq derives the io_context from current task, it will be broken. > > > > To mitigate that problem, we probably need to implement Fernando's > > suggestion of putting io_context pointer in bio. > > > > Have you already done something to solve this issue? > > > > Secondly, why do we have to create an additional dm-ioband device for > > every device we want to control using rules. This looks little odd > > atleast to me. Can't we keep it in line with rest of the controllers > > where task grouping takes place using cgroup and rules are specified in > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > device we want to subject to rules, I was thinking of maintaining an > > rb-tree per request queue. Requests will first go into this rb-tree upon > > __make_request() and then will filter down to elevator associated with the > > queue (if there is one). This will provide us the control of releasing > > bio's to elevaor based on policies (proportional weight, max bandwidth > > etc) and no need of stacking additional block device. > > > > I am working on some experimental proof of concept patches. It will take > > some time though. > > > > I was thinking of following. > > > > - Adopt the Andrea Righi's style of specifying rules for devices and > > group the tasks using cgroups. > > > > - To begin with, adopt dm-ioband's approach of proportional bandwidth > > controller. It makes sense to me limit the bandwidth usage only in > > case of contention. If there is really a need to limit max bandwidth, > > then probably we can do something to implement additional rules or > > implement some policy switcher where user can decide what kind of > > policies need to be implemented. > > > > - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > > request queue which is controlled by some kind of cgroup rules. > > > > It would be good to discuss above approach now whether it makes sense or > > not. I think it is kind of fusion of io-throttling and dm-ioband patches > > with additional idea of doing io-control just above elevator on the request > > queue using an rb-tree. > > Thanks Vivek. All sounds reasonable to me and I think this is be the right way > to proceed. > > I'll try to design and implement your rb-tree per request-queue idea into my > io-throttle controller, maybe we can reuse it also for a more generic solution. > Feel free to send me your experimental proof of concept if you want, even if > it's not yet complete, I can review it, test and contribute. Currently I have taken code from bio-cgroup to implement cgroups and to provide functionality to associate a bio to a cgroup. I need this to be able to queue the bio's at right node in the rb-tree and then also to be able to take a decision when is the right time to release few requests. Right now in crude implementation, I am working on making system boot. Once patches are at least in little bit working shape, I will send it to you to have a look. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 15:06 ` Vivek Goyal (?) @ 2008-09-18 15:18 ` Andrea Righi -1 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-18 15:18 UTC (permalink / raw) To: Vivek Goyal, Ryo Tsuruta Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, balbir Vivek Goyal wrote: > On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: >> Vivek Goyal wrote: >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >>>> Hi All, >>>> >>>> I have got excellent results of dm-ioband, that controls the disk I/O >>>> bandwidth even when it accepts delayed write requests. >>>> >>>> In this time, I ran some benchmarks with a high-end storage. The >>>> reason was to avoid a performance bottleneck due to mechanical factors >>>> such as seek time. >>>> >>>> You can see the details of the benchmarks at: >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >>>> >>> Hi Ryo, >>> >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break >>> the notion of process priority in CFQ because now dm-ioband device will >>> hold the bio and issue these to lower layers later based on which bio's >>> become ready. Hence actual bio submitting context might be different and >>> because cfq derives the io_context from current task, it will be broken. >>> >>> To mitigate that problem, we probably need to implement Fernando's >>> suggestion of putting io_context pointer in bio. >>> >>> Have you already done something to solve this issue? >>> >>> Secondly, why do we have to create an additional dm-ioband device for >>> every device we want to control using rules. This looks little odd >>> atleast to me. Can't we keep it in line with rest of the controllers >>> where task grouping takes place using cgroup and rules are specified in >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? >>> >>> To avoid creation of stacking another device (dm-ioband) on top of every >>> device we want to subject to rules, I was thinking of maintaining an >>> rb-tree per request queue. Requests will first go into this rb-tree upon >>> __make_request() and then will filter down to elevator associated with the >>> queue (if there is one). This will provide us the control of releasing >>> bio's to elevaor based on policies (proportional weight, max bandwidth >>> etc) and no need of stacking additional block device. >>> >>> I am working on some experimental proof of concept patches. It will take >>> some time though. >>> >>> I was thinking of following. >>> >>> - Adopt the Andrea Righi's style of specifying rules for devices and >>> group the tasks using cgroups. >>> >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth >>> controller. It makes sense to me limit the bandwidth usage only in >>> case of contention. If there is really a need to limit max bandwidth, >>> then probably we can do something to implement additional rules or >>> implement some policy switcher where user can decide what kind of >>> policies need to be implemented. >>> >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every >>> request queue which is controlled by some kind of cgroup rules. >>> >>> It would be good to discuss above approach now whether it makes sense or >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches >>> with additional idea of doing io-control just above elevator on the request >>> queue using an rb-tree. >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way >> to proceed. >> >> I'll try to design and implement your rb-tree per request-queue idea into my >> io-throttle controller, maybe we can reuse it also for a more generic solution. >> Feel free to send me your experimental proof of concept if you want, even if >> it's not yet complete, I can review it, test and contribute. > > Currently I have taken code from bio-cgroup to implement cgroups and to > provide functionality to associate a bio to a cgroup. I need this to be > able to queue the bio's at right node in the rb-tree and then also to be > able to take a decision when is the right time to release few requests. > > Right now in crude implementation, I am working on making system boot. > Once patches are at least in little bit working shape, I will send it to you > to have a look. > > Thanks > Vivek I wonder... wouldn't be simpler to just use the memory controller to retrieve this information starting from struct page? I mean, following this path (in short, obviously using the appropriate interfaces for locking and referencing the different objects): cgrp = page->page_cgroup->mem_cgroup->css.cgroup Once you get the cgrp it's very easy to use the corresponding controller structure. Actually, this is how I'm doing in cgroup-io-throttle to associate a bio to a cgroup. What other functionalities/advantages bio-cgroup provide in addition to that? Thanks, -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080918150634.GH20640-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080918150634.GH20640-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-09-18 15:18 ` Andrea Righi 0 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-18 15:18 UTC (permalink / raw) To: Vivek Goyal, Ryo Tsuruta Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Vivek Goyal wrote: > On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: >> Vivek Goyal wrote: >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >>>> Hi All, >>>> >>>> I have got excellent results of dm-ioband, that controls the disk I/O >>>> bandwidth even when it accepts delayed write requests. >>>> >>>> In this time, I ran some benchmarks with a high-end storage. The >>>> reason was to avoid a performance bottleneck due to mechanical factors >>>> such as seek time. >>>> >>>> You can see the details of the benchmarks at: >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >>>> >>> Hi Ryo, >>> >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break >>> the notion of process priority in CFQ because now dm-ioband device will >>> hold the bio and issue these to lower layers later based on which bio's >>> become ready. Hence actual bio submitting context might be different and >>> because cfq derives the io_context from current task, it will be broken. >>> >>> To mitigate that problem, we probably need to implement Fernando's >>> suggestion of putting io_context pointer in bio. >>> >>> Have you already done something to solve this issue? >>> >>> Secondly, why do we have to create an additional dm-ioband device for >>> every device we want to control using rules. This looks little odd >>> atleast to me. Can't we keep it in line with rest of the controllers >>> where task grouping takes place using cgroup and rules are specified in >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? >>> >>> To avoid creation of stacking another device (dm-ioband) on top of every >>> device we want to subject to rules, I was thinking of maintaining an >>> rb-tree per request queue. Requests will first go into this rb-tree upon >>> __make_request() and then will filter down to elevator associated with the >>> queue (if there is one). This will provide us the control of releasing >>> bio's to elevaor based on policies (proportional weight, max bandwidth >>> etc) and no need of stacking additional block device. >>> >>> I am working on some experimental proof of concept patches. It will take >>> some time though. >>> >>> I was thinking of following. >>> >>> - Adopt the Andrea Righi's style of specifying rules for devices and >>> group the tasks using cgroups. >>> >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth >>> controller. It makes sense to me limit the bandwidth usage only in >>> case of contention. If there is really a need to limit max bandwidth, >>> then probably we can do something to implement additional rules or >>> implement some policy switcher where user can decide what kind of >>> policies need to be implemented. >>> >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every >>> request queue which is controlled by some kind of cgroup rules. >>> >>> It would be good to discuss above approach now whether it makes sense or >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches >>> with additional idea of doing io-control just above elevator on the request >>> queue using an rb-tree. >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way >> to proceed. >> >> I'll try to design and implement your rb-tree per request-queue idea into my >> io-throttle controller, maybe we can reuse it also for a more generic solution. >> Feel free to send me your experimental proof of concept if you want, even if >> it's not yet complete, I can review it, test and contribute. > > Currently I have taken code from bio-cgroup to implement cgroups and to > provide functionality to associate a bio to a cgroup. I need this to be > able to queue the bio's at right node in the rb-tree and then also to be > able to take a decision when is the right time to release few requests. > > Right now in crude implementation, I am working on making system boot. > Once patches are at least in little bit working shape, I will send it to you > to have a look. > > Thanks > Vivek I wonder... wouldn't be simpler to just use the memory controller to retrieve this information starting from struct page? I mean, following this path (in short, obviously using the appropriate interfaces for locking and referencing the different objects): cgrp = page->page_cgroup->mem_cgroup->css.cgroup Once you get the cgrp it's very easy to use the corresponding controller structure. Actually, this is how I'm doing in cgroup-io-throttle to associate a bio to a cgroup. What other functionalities/advantages bio-cgroup provide in addition to that? Thanks, -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 15:06 ` Vivek Goyal @ 2008-09-18 15:18 ` Andrea Righi -1 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-18 15:18 UTC (permalink / raw) To: Vivek Goyal, Ryo Tsuruta Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, balbir Vivek Goyal wrote: > On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: >> Vivek Goyal wrote: >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >>>> Hi All, >>>> >>>> I have got excellent results of dm-ioband, that controls the disk I/O >>>> bandwidth even when it accepts delayed write requests. >>>> >>>> In this time, I ran some benchmarks with a high-end storage. The >>>> reason was to avoid a performance bottleneck due to mechanical factors >>>> such as seek time. >>>> >>>> You can see the details of the benchmarks at: >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >>>> >>> Hi Ryo, >>> >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break >>> the notion of process priority in CFQ because now dm-ioband device will >>> hold the bio and issue these to lower layers later based on which bio's >>> become ready. Hence actual bio submitting context might be different and >>> because cfq derives the io_context from current task, it will be broken. >>> >>> To mitigate that problem, we probably need to implement Fernando's >>> suggestion of putting io_context pointer in bio. >>> >>> Have you already done something to solve this issue? >>> >>> Secondly, why do we have to create an additional dm-ioband device for >>> every device we want to control using rules. This looks little odd >>> atleast to me. Can't we keep it in line with rest of the controllers >>> where task grouping takes place using cgroup and rules are specified in >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? >>> >>> To avoid creation of stacking another device (dm-ioband) on top of every >>> device we want to subject to rules, I was thinking of maintaining an >>> rb-tree per request queue. Requests will first go into this rb-tree upon >>> __make_request() and then will filter down to elevator associated with the >>> queue (if there is one). This will provide us the control of releasing >>> bio's to elevaor based on policies (proportional weight, max bandwidth >>> etc) and no need of stacking additional block device. >>> >>> I am working on some experimental proof of concept patches. It will take >>> some time though. >>> >>> I was thinking of following. >>> >>> - Adopt the Andrea Righi's style of specifying rules for devices and >>> group the tasks using cgroups. >>> >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth >>> controller. It makes sense to me limit the bandwidth usage only in >>> case of contention. If there is really a need to limit max bandwidth, >>> then probably we can do something to implement additional rules or >>> implement some policy switcher where user can decide what kind of >>> policies need to be implemented. >>> >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every >>> request queue which is controlled by some kind of cgroup rules. >>> >>> It would be good to discuss above approach now whether it makes sense or >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches >>> with additional idea of doing io-control just above elevator on the request >>> queue using an rb-tree. >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way >> to proceed. >> >> I'll try to design and implement your rb-tree per request-queue idea into my >> io-throttle controller, maybe we can reuse it also for a more generic solution. >> Feel free to send me your experimental proof of concept if you want, even if >> it's not yet complete, I can review it, test and contribute. > > Currently I have taken code from bio-cgroup to implement cgroups and to > provide functionality to associate a bio to a cgroup. I need this to be > able to queue the bio's at right node in the rb-tree and then also to be > able to take a decision when is the right time to release few requests. > > Right now in crude implementation, I am working on making system boot. > Once patches are at least in little bit working shape, I will send it to you > to have a look. > > Thanks > Vivek I wonder... wouldn't be simpler to just use the memory controller to retrieve this information starting from struct page? I mean, following this path (in short, obviously using the appropriate interfaces for locking and referencing the different objects): cgrp = page->page_cgroup->mem_cgroup->css.cgroup Once you get the cgrp it's very easy to use the corresponding controller structure. Actually, this is how I'm doing in cgroup-io-throttle to associate a bio to a cgroup. What other functionalities/advantages bio-cgroup provide in addition to that? Thanks, -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-18 15:18 ` Andrea Righi 0 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-18 15:18 UTC (permalink / raw) To: Vivek Goyal, Ryo Tsuruta Cc: linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe Vivek Goyal wrote: > On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: >> Vivek Goyal wrote: >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >>>> Hi All, >>>> >>>> I have got excellent results of dm-ioband, that controls the disk I/O >>>> bandwidth even when it accepts delayed write requests. >>>> >>>> In this time, I ran some benchmarks with a high-end storage. The >>>> reason was to avoid a performance bottleneck due to mechanical factors >>>> such as seek time. >>>> >>>> You can see the details of the benchmarks at: >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >>>> >>> Hi Ryo, >>> >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break >>> the notion of process priority in CFQ because now dm-ioband device will >>> hold the bio and issue these to lower layers later based on which bio's >>> become ready. Hence actual bio submitting context might be different and >>> because cfq derives the io_context from current task, it will be broken. >>> >>> To mitigate that problem, we probably need to implement Fernando's >>> suggestion of putting io_context pointer in bio. >>> >>> Have you already done something to solve this issue? >>> >>> Secondly, why do we have to create an additional dm-ioband device for >>> every device we want to control using rules. This looks little odd >>> atleast to me. Can't we keep it in line with rest of the controllers >>> where task grouping takes place using cgroup and rules are specified in >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? >>> >>> To avoid creation of stacking another device (dm-ioband) on top of every >>> device we want to subject to rules, I was thinking of maintaining an >>> rb-tree per request queue. Requests will first go into this rb-tree upon >>> __make_request() and then will filter down to elevator associated with the >>> queue (if there is one). This will provide us the control of releasing >>> bio's to elevaor based on policies (proportional weight, max bandwidth >>> etc) and no need of stacking additional block device. >>> >>> I am working on some experimental proof of concept patches. It will take >>> some time though. >>> >>> I was thinking of following. >>> >>> - Adopt the Andrea Righi's style of specifying rules for devices and >>> group the tasks using cgroups. >>> >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth >>> controller. It makes sense to me limit the bandwidth usage only in >>> case of contention. If there is really a need to limit max bandwidth, >>> then probably we can do something to implement additional rules or >>> implement some policy switcher where user can decide what kind of >>> policies need to be implemented. >>> >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every >>> request queue which is controlled by some kind of cgroup rules. >>> >>> It would be good to discuss above approach now whether it makes sense or >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches >>> with additional idea of doing io-control just above elevator on the request >>> queue using an rb-tree. >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way >> to proceed. >> >> I'll try to design and implement your rb-tree per request-queue idea into my >> io-throttle controller, maybe we can reuse it also for a more generic solution. >> Feel free to send me your experimental proof of concept if you want, even if >> it's not yet complete, I can review it, test and contribute. > > Currently I have taken code from bio-cgroup to implement cgroups and to > provide functionality to associate a bio to a cgroup. I need this to be > able to queue the bio's at right node in the rb-tree and then also to be > able to take a decision when is the right time to release few requests. > > Right now in crude implementation, I am working on making system boot. > Once patches are at least in little bit working shape, I will send it to you > to have a look. > > Thanks > Vivek I wonder... wouldn't be simpler to just use the memory controller to retrieve this information starting from struct page? I mean, following this path (in short, obviously using the appropriate interfaces for locking and referencing the different objects): cgrp = page->page_cgroup->mem_cgroup->css.cgroup Once you get the cgrp it's very easy to use the corresponding controller structure. Actually, this is how I'm doing in cgroup-io-throttle to associate a bio to a cgroup. What other functionalities/advantages bio-cgroup provide in addition to that? Thanks, -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <48D2715A.6060002-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <48D2715A.6060002-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2008-09-18 16:20 ` Vivek Goyal 2008-09-19 3:34 ` [dm-devel] " Hirokazu Takahashi 1 sibling, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-18 16:20 UTC (permalink / raw) To: Andrea Righi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 On Thu, Sep 18, 2008 at 05:18:50PM +0200, Andrea Righi wrote: > Vivek Goyal wrote: > > On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: > >> Vivek Goyal wrote: > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > >>>> Hi All, > >>>> > >>>> I have got excellent results of dm-ioband, that controls the disk I/O > >>>> bandwidth even when it accepts delayed write requests. > >>>> > >>>> In this time, I ran some benchmarks with a high-end storage. The > >>>> reason was to avoid a performance bottleneck due to mechanical factors > >>>> such as seek time. > >>>> > >>>> You can see the details of the benchmarks at: > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > >>>> > >>> Hi Ryo, > >>> > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > >>> the notion of process priority in CFQ because now dm-ioband device will > >>> hold the bio and issue these to lower layers later based on which bio's > >>> become ready. Hence actual bio submitting context might be different and > >>> because cfq derives the io_context from current task, it will be broken. > >>> > >>> To mitigate that problem, we probably need to implement Fernando's > >>> suggestion of putting io_context pointer in bio. > >>> > >>> Have you already done something to solve this issue? > >>> > >>> Secondly, why do we have to create an additional dm-ioband device for > >>> every device we want to control using rules. This looks little odd > >>> atleast to me. Can't we keep it in line with rest of the controllers > >>> where task grouping takes place using cgroup and rules are specified in > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? > >>> > >>> To avoid creation of stacking another device (dm-ioband) on top of every > >>> device we want to subject to rules, I was thinking of maintaining an > >>> rb-tree per request queue. Requests will first go into this rb-tree upon > >>> __make_request() and then will filter down to elevator associated with the > >>> queue (if there is one). This will provide us the control of releasing > >>> bio's to elevaor based on policies (proportional weight, max bandwidth > >>> etc) and no need of stacking additional block device. > >>> > >>> I am working on some experimental proof of concept patches. It will take > >>> some time though. > >>> > >>> I was thinking of following. > >>> > >>> - Adopt the Andrea Righi's style of specifying rules for devices and > >>> group the tasks using cgroups. > >>> > >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth > >>> controller. It makes sense to me limit the bandwidth usage only in > >>> case of contention. If there is really a need to limit max bandwidth, > >>> then probably we can do something to implement additional rules or > >>> implement some policy switcher where user can decide what kind of > >>> policies need to be implemented. > >>> > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > >>> request queue which is controlled by some kind of cgroup rules. > >>> > >>> It would be good to discuss above approach now whether it makes sense or > >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches > >>> with additional idea of doing io-control just above elevator on the request > >>> queue using an rb-tree. > >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way > >> to proceed. > >> > >> I'll try to design and implement your rb-tree per request-queue idea into my > >> io-throttle controller, maybe we can reuse it also for a more generic solution. > >> Feel free to send me your experimental proof of concept if you want, even if > >> it's not yet complete, I can review it, test and contribute. > > > > Currently I have taken code from bio-cgroup to implement cgroups and to > > provide functionality to associate a bio to a cgroup. I need this to be > > able to queue the bio's at right node in the rb-tree and then also to be > > able to take a decision when is the right time to release few requests. > > > > Right now in crude implementation, I am working on making system boot. > > Once patches are at least in little bit working shape, I will send it to you > > to have a look. > > > > Thanks > > Vivek > > I wonder... wouldn't be simpler to just use the memory controller > to retrieve this information starting from struct page? > > I mean, following this path (in short, obviously using the appropriate > interfaces for locking and referencing the different objects): > > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > Andrea, Ok, you are first retrieving cgroup associated page owner and then retrieving repsective iothrottle state using that cgroup, (cgroup_to_iothrottle). I have yet to dive deeper into cgroup data structures but does it work if iothrottle and memory controller are mounted on separate hierarchies? bio-cgroup guys are also doing similar thing in the sense retrieving relevant pointer through page and page_cgroup and use that to reach bio_cgroup strucutre. The difference is that they don't retrieve first css object of mem_cgroup instead they directly store the pointer of bio_cgroup in page_cgroup (When page is being charged in memory controller). While page is being charged, determine the bio_cgroup, associated with the task and store this info in page->page_cgroup->bio_cgroup. static inline struct bio_cgroup *bio_cgroup_from_task(struct task_struct *p) { return container_of(task_subsys_state(p, bio_cgroup_subsys_id), struct bio_cgroup, css); } At any later point, one can look at bio and reach respective bio_cgroup by. bio->page->page_cgroup->bio_cgroup. Looks like now we are getting rid of page_cgroup pointer in "struct page" and we shall have to change the implementation accordingly. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks [not found] ` <48D2715A.6060002-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2008-09-18 16:20 ` Vivek Goyal @ 2008-09-19 3:34 ` Hirokazu Takahashi 1 sibling, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-19 3:34 UTC (permalink / raw) To: righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, dm-devel-H+wXaHxf7aLQT0dZR+AlfA Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, agk-9JcytcrH/bA+uJoB2kUjGw, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, fernando-gVGce1chcLdL9jVzuh4AOg, xemul-GEFAQzZX7r8dnm+yROfE0A Hi, > >> Vivek Goyal wrote: > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > >>>> Hi All, > >>>> > >>>> I have got excellent results of dm-ioband, that controls the disk I/O > >>>> bandwidth even when it accepts delayed write requests. > >>>> > >>>> In this time, I ran some benchmarks with a high-end storage. The > >>>> reason was to avoid a performance bottleneck due to mechanical factors > >>>> such as seek time. > >>>> > >>>> You can see the details of the benchmarks at: > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > >>>> > >>> Hi Ryo, > >>> > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > >>> the notion of process priority in CFQ because now dm-ioband device will > >>> hold the bio and issue these to lower layers later based on which bio's > >>> become ready. Hence actual bio submitting context might be different and > >>> because cfq derives the io_context from current task, it will be broken. > >>> > >>> To mitigate that problem, we probably need to implement Fernando's > >>> suggestion of putting io_context pointer in bio. > >>> > >>> Have you already done something to solve this issue? > >>> > >>> Secondly, why do we have to create an additional dm-ioband device for > >>> every device we want to control using rules. This looks little odd > >>> atleast to me. Can't we keep it in line with rest of the controllers > >>> where task grouping takes place using cgroup and rules are specified in > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? > >>> > >>> To avoid creation of stacking another device (dm-ioband) on top of every > >>> device we want to subject to rules, I was thinking of maintaining an > >>> rb-tree per request queue. Requests will first go into this rb-tree upon > >>> __make_request() and then will filter down to elevator associated with the > >>> queue (if there is one). This will provide us the control of releasing > >>> bio's to elevaor based on policies (proportional weight, max bandwidth > >>> etc) and no need of stacking additional block device. > >>> > >>> I am working on some experimental proof of concept patches. It will take > >>> some time though. > >>> > >>> I was thinking of following. > >>> > >>> - Adopt the Andrea Righi's style of specifying rules for devices and > >>> group the tasks using cgroups. > >>> > >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth > >>> controller. It makes sense to me limit the bandwidth usage only in > >>> case of contention. If there is really a need to limit max bandwidth, > >>> then probably we can do something to implement additional rules or > >>> implement some policy switcher where user can decide what kind of > >>> policies need to be implemented. > >>> > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > >>> request queue which is controlled by some kind of cgroup rules. > >>> > >>> It would be good to discuss above approach now whether it makes sense or > >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches > >>> with additional idea of doing io-control just above elevator on the request > >>> queue using an rb-tree. > >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way > >> to proceed. > >> > >> I'll try to design and implement your rb-tree per request-queue idea into my > >> io-throttle controller, maybe we can reuse it also for a more generic solution. > >> Feel free to send me your experimental proof of concept if you want, even if > >> it's not yet complete, I can review it, test and contribute. > > > > Currently I have taken code from bio-cgroup to implement cgroups and to > > provide functionality to associate a bio to a cgroup. I need this to be > > able to queue the bio's at right node in the rb-tree and then also to be > > able to take a decision when is the right time to release few requests. > > > > Right now in crude implementation, I am working on making system boot. > > Once patches are at least in little bit working shape, I will send it to you > > to have a look. > > > > Thanks > > Vivek > > I wonder... wouldn't be simpler to just use the memory controller > to retrieve this information starting from struct page? > > I mean, following this path (in short, obviously using the appropriate > interfaces for locking and referencing the different objects): > > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > > Once you get the cgrp it's very easy to use the corresponding controller > structure. > > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio > to a cgroup. What other functionalities/advantages bio-cgroup provide in > addition to that? I've decided to get Ryo to post the accurate dirty-page tracking patch for bio-cgroup, which isn't perfect yet though. The memory controller never wants to support this tracking because migrating a page between memory cgroups is really heavy. I also thought enhancing the memory controller would be good enough, but a lot of people said they wanted to control memory resource and block I/O resource separately. So you can create several bio-cgroup in one memory-cgroup, or you can use bio-cgroup without memory-cgroup. I also have a plan to implement more acurate tracking mechanism on bio-cgroup after the memory cgroup team re-implement the infrastructure, which won't be supported by memory-cgroup. When a process are moved into another memory cgroup, the pages belonging to the process don't move to the new cgroup because migrating pages is so heavy. It's hard to find the pages from the process and migrating pages may cause some memory pressure. I'll implement this feature only on bio-cgroup with minimum overhead Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 15:18 ` Andrea Righi @ 2008-09-18 16:20 ` Vivek Goyal -1 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-18 16:20 UTC (permalink / raw) To: Andrea Righi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, balbir On Thu, Sep 18, 2008 at 05:18:50PM +0200, Andrea Righi wrote: > Vivek Goyal wrote: > > On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: > >> Vivek Goyal wrote: > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > >>>> Hi All, > >>>> > >>>> I have got excellent results of dm-ioband, that controls the disk I/O > >>>> bandwidth even when it accepts delayed write requests. > >>>> > >>>> In this time, I ran some benchmarks with a high-end storage. The > >>>> reason was to avoid a performance bottleneck due to mechanical factors > >>>> such as seek time. > >>>> > >>>> You can see the details of the benchmarks at: > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > >>>> > >>> Hi Ryo, > >>> > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > >>> the notion of process priority in CFQ because now dm-ioband device will > >>> hold the bio and issue these to lower layers later based on which bio's > >>> become ready. Hence actual bio submitting context might be different and > >>> because cfq derives the io_context from current task, it will be broken. > >>> > >>> To mitigate that problem, we probably need to implement Fernando's > >>> suggestion of putting io_context pointer in bio. > >>> > >>> Have you already done something to solve this issue? > >>> > >>> Secondly, why do we have to create an additional dm-ioband device for > >>> every device we want to control using rules. This looks little odd > >>> atleast to me. Can't we keep it in line with rest of the controllers > >>> where task grouping takes place using cgroup and rules are specified in > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? > >>> > >>> To avoid creation of stacking another device (dm-ioband) on top of every > >>> device we want to subject to rules, I was thinking of maintaining an > >>> rb-tree per request queue. Requests will first go into this rb-tree upon > >>> __make_request() and then will filter down to elevator associated with the > >>> queue (if there is one). This will provide us the control of releasing > >>> bio's to elevaor based on policies (proportional weight, max bandwidth > >>> etc) and no need of stacking additional block device. > >>> > >>> I am working on some experimental proof of concept patches. It will take > >>> some time though. > >>> > >>> I was thinking of following. > >>> > >>> - Adopt the Andrea Righi's style of specifying rules for devices and > >>> group the tasks using cgroups. > >>> > >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth > >>> controller. It makes sense to me limit the bandwidth usage only in > >>> case of contention. If there is really a need to limit max bandwidth, > >>> then probably we can do something to implement additional rules or > >>> implement some policy switcher where user can decide what kind of > >>> policies need to be implemented. > >>> > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > >>> request queue which is controlled by some kind of cgroup rules. > >>> > >>> It would be good to discuss above approach now whether it makes sense or > >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches > >>> with additional idea of doing io-control just above elevator on the request > >>> queue using an rb-tree. > >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way > >> to proceed. > >> > >> I'll try to design and implement your rb-tree per request-queue idea into my > >> io-throttle controller, maybe we can reuse it also for a more generic solution. > >> Feel free to send me your experimental proof of concept if you want, even if > >> it's not yet complete, I can review it, test and contribute. > > > > Currently I have taken code from bio-cgroup to implement cgroups and to > > provide functionality to associate a bio to a cgroup. I need this to be > > able to queue the bio's at right node in the rb-tree and then also to be > > able to take a decision when is the right time to release few requests. > > > > Right now in crude implementation, I am working on making system boot. > > Once patches are at least in little bit working shape, I will send it to you > > to have a look. > > > > Thanks > > Vivek > > I wonder... wouldn't be simpler to just use the memory controller > to retrieve this information starting from struct page? > > I mean, following this path (in short, obviously using the appropriate > interfaces for locking and referencing the different objects): > > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > Andrea, Ok, you are first retrieving cgroup associated page owner and then retrieving repsective iothrottle state using that cgroup, (cgroup_to_iothrottle). I have yet to dive deeper into cgroup data structures but does it work if iothrottle and memory controller are mounted on separate hierarchies? bio-cgroup guys are also doing similar thing in the sense retrieving relevant pointer through page and page_cgroup and use that to reach bio_cgroup strucutre. The difference is that they don't retrieve first css object of mem_cgroup instead they directly store the pointer of bio_cgroup in page_cgroup (When page is being charged in memory controller). While page is being charged, determine the bio_cgroup, associated with the task and store this info in page->page_cgroup->bio_cgroup. static inline struct bio_cgroup *bio_cgroup_from_task(struct task_struct *p) { return container_of(task_subsys_state(p, bio_cgroup_subsys_id), struct bio_cgroup, css); } At any later point, one can look at bio and reach respective bio_cgroup by. bio->page->page_cgroup->bio_cgroup. Looks like now we are getting rid of page_cgroup pointer in "struct page" and we shall have to change the implementation accordingly. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-18 16:20 ` Vivek Goyal 0 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-18 16:20 UTC (permalink / raw) To: Andrea Righi Cc: Ryo Tsuruta, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe On Thu, Sep 18, 2008 at 05:18:50PM +0200, Andrea Righi wrote: > Vivek Goyal wrote: > > On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: > >> Vivek Goyal wrote: > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > >>>> Hi All, > >>>> > >>>> I have got excellent results of dm-ioband, that controls the disk I/O > >>>> bandwidth even when it accepts delayed write requests. > >>>> > >>>> In this time, I ran some benchmarks with a high-end storage. The > >>>> reason was to avoid a performance bottleneck due to mechanical factors > >>>> such as seek time. > >>>> > >>>> You can see the details of the benchmarks at: > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > >>>> > >>> Hi Ryo, > >>> > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > >>> the notion of process priority in CFQ because now dm-ioband device will > >>> hold the bio and issue these to lower layers later based on which bio's > >>> become ready. Hence actual bio submitting context might be different and > >>> because cfq derives the io_context from current task, it will be broken. > >>> > >>> To mitigate that problem, we probably need to implement Fernando's > >>> suggestion of putting io_context pointer in bio. > >>> > >>> Have you already done something to solve this issue? > >>> > >>> Secondly, why do we have to create an additional dm-ioband device for > >>> every device we want to control using rules. This looks little odd > >>> atleast to me. Can't we keep it in line with rest of the controllers > >>> where task grouping takes place using cgroup and rules are specified in > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? > >>> > >>> To avoid creation of stacking another device (dm-ioband) on top of every > >>> device we want to subject to rules, I was thinking of maintaining an > >>> rb-tree per request queue. Requests will first go into this rb-tree upon > >>> __make_request() and then will filter down to elevator associated with the > >>> queue (if there is one). This will provide us the control of releasing > >>> bio's to elevaor based on policies (proportional weight, max bandwidth > >>> etc) and no need of stacking additional block device. > >>> > >>> I am working on some experimental proof of concept patches. It will take > >>> some time though. > >>> > >>> I was thinking of following. > >>> > >>> - Adopt the Andrea Righi's style of specifying rules for devices and > >>> group the tasks using cgroups. > >>> > >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth > >>> controller. It makes sense to me limit the bandwidth usage only in > >>> case of contention. If there is really a need to limit max bandwidth, > >>> then probably we can do something to implement additional rules or > >>> implement some policy switcher where user can decide what kind of > >>> policies need to be implemented. > >>> > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > >>> request queue which is controlled by some kind of cgroup rules. > >>> > >>> It would be good to discuss above approach now whether it makes sense or > >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches > >>> with additional idea of doing io-control just above elevator on the request > >>> queue using an rb-tree. > >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way > >> to proceed. > >> > >> I'll try to design and implement your rb-tree per request-queue idea into my > >> io-throttle controller, maybe we can reuse it also for a more generic solution. > >> Feel free to send me your experimental proof of concept if you want, even if > >> it's not yet complete, I can review it, test and contribute. > > > > Currently I have taken code from bio-cgroup to implement cgroups and to > > provide functionality to associate a bio to a cgroup. I need this to be > > able to queue the bio's at right node in the rb-tree and then also to be > > able to take a decision when is the right time to release few requests. > > > > Right now in crude implementation, I am working on making system boot. > > Once patches are at least in little bit working shape, I will send it to you > > to have a look. > > > > Thanks > > Vivek > > I wonder... wouldn't be simpler to just use the memory controller > to retrieve this information starting from struct page? > > I mean, following this path (in short, obviously using the appropriate > interfaces for locking and referencing the different objects): > > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > Andrea, Ok, you are first retrieving cgroup associated page owner and then retrieving repsective iothrottle state using that cgroup, (cgroup_to_iothrottle). I have yet to dive deeper into cgroup data structures but does it work if iothrottle and memory controller are mounted on separate hierarchies? bio-cgroup guys are also doing similar thing in the sense retrieving relevant pointer through page and page_cgroup and use that to reach bio_cgroup strucutre. The difference is that they don't retrieve first css object of mem_cgroup instead they directly store the pointer of bio_cgroup in page_cgroup (When page is being charged in memory controller). While page is being charged, determine the bio_cgroup, associated with the task and store this info in page->page_cgroup->bio_cgroup. static inline struct bio_cgroup *bio_cgroup_from_task(struct task_struct *p) { return container_of(task_subsys_state(p, bio_cgroup_subsys_id), struct bio_cgroup, css); } At any later point, one can look at bio and reach respective bio_cgroup by. bio->page->page_cgroup->bio_cgroup. Looks like now we are getting rid of page_cgroup pointer in "struct page" and we shall have to change the implementation accordingly. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 16:20 ` Vivek Goyal (?) @ 2008-09-18 19:54 ` Andrea Righi -1 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-18 19:54 UTC (permalink / raw) To: Vivek Goyal Cc: Ryo Tsuruta, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe Vivek Goyal wrote: > On Thu, Sep 18, 2008 at 05:18:50PM +0200, Andrea Righi wrote: >> Vivek Goyal wrote: >>> On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: >>>> Vivek Goyal wrote: >>>>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >>>>>> Hi All, >>>>>> >>>>>> I have got excellent results of dm-ioband, that controls the disk I/O >>>>>> bandwidth even when it accepts delayed write requests. >>>>>> >>>>>> In this time, I ran some benchmarks with a high-end storage. The >>>>>> reason was to avoid a performance bottleneck due to mechanical factors >>>>>> such as seek time. >>>>>> >>>>>> You can see the details of the benchmarks at: >>>>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >>>>>> >>>>> Hi Ryo, >>>>> >>>>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break >>>>> the notion of process priority in CFQ because now dm-ioband device will >>>>> hold the bio and issue these to lower layers later based on which bio's >>>>> become ready. Hence actual bio submitting context might be different and >>>>> because cfq derives the io_context from current task, it will be broken. >>>>> >>>>> To mitigate that problem, we probably need to implement Fernando's >>>>> suggestion of putting io_context pointer in bio. >>>>> >>>>> Have you already done something to solve this issue? >>>>> >>>>> Secondly, why do we have to create an additional dm-ioband device for >>>>> every device we want to control using rules. This looks little odd >>>>> atleast to me. Can't we keep it in line with rest of the controllers >>>>> where task grouping takes place using cgroup and rules are specified in >>>>> cgroup itself (The way Andrea Righi does for io-throttling patches)? >>>>> >>>>> To avoid creation of stacking another device (dm-ioband) on top of every >>>>> device we want to subject to rules, I was thinking of maintaining an >>>>> rb-tree per request queue. Requests will first go into this rb-tree upon >>>>> __make_request() and then will filter down to elevator associated with the >>>>> queue (if there is one). This will provide us the control of releasing >>>>> bio's to elevaor based on policies (proportional weight, max bandwidth >>>>> etc) and no need of stacking additional block device. >>>>> >>>>> I am working on some experimental proof of concept patches. It will take >>>>> some time though. >>>>> >>>>> I was thinking of following. >>>>> >>>>> - Adopt the Andrea Righi's style of specifying rules for devices and >>>>> group the tasks using cgroups. >>>>> >>>>> - To begin with, adopt dm-ioband's approach of proportional bandwidth >>>>> controller. It makes sense to me limit the bandwidth usage only in >>>>> case of contention. If there is really a need to limit max bandwidth, >>>>> then probably we can do something to implement additional rules or >>>>> implement some policy switcher where user can decide what kind of >>>>> policies need to be implemented. >>>>> >>>>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every >>>>> request queue which is controlled by some kind of cgroup rules. >>>>> >>>>> It would be good to discuss above approach now whether it makes sense or >>>>> not. I think it is kind of fusion of io-throttling and dm-ioband patches >>>>> with additional idea of doing io-control just above elevator on the request >>>>> queue using an rb-tree. >>>> Thanks Vivek. All sounds reasonable to me and I think this is be the right way >>>> to proceed. >>>> >>>> I'll try to design and implement your rb-tree per request-queue idea into my >>>> io-throttle controller, maybe we can reuse it also for a more generic solution. >>>> Feel free to send me your experimental proof of concept if you want, even if >>>> it's not yet complete, I can review it, test and contribute. >>> Currently I have taken code from bio-cgroup to implement cgroups and to >>> provide functionality to associate a bio to a cgroup. I need this to be >>> able to queue the bio's at right node in the rb-tree and then also to be >>> able to take a decision when is the right time to release few requests. >>> >>> Right now in crude implementation, I am working on making system boot. >>> Once patches are at least in little bit working shape, I will send it to you >>> to have a look. >>> >>> Thanks >>> Vivek >> I wonder... wouldn't be simpler to just use the memory controller >> to retrieve this information starting from struct page? >> >> I mean, following this path (in short, obviously using the appropriate >> interfaces for locking and referencing the different objects): >> >> cgrp = page->page_cgroup->mem_cgroup->css.cgroup >> > > Andrea, > > Ok, you are first retrieving cgroup associated page owner and then > retrieving repsective iothrottle state using that > cgroup, (cgroup_to_iothrottle). I have yet to dive deeper into cgroup Correct. > data structures but does it work if iothrottle and memory controller > are mounted on separate hierarchies? ehm... I've to check. I usually mount all the controllers into the same hierarchy. :P > bio-cgroup guys are also doing similar thing in the sense retrieving > relevant pointer through page and page_cgroup and use that to reach > bio_cgroup strucutre. The difference is that they don't retrieve first > css object of mem_cgroup instead they directly store the pointer of > bio_cgroup in page_cgroup (When page is being charged in memory controller). > > While page is being charged, determine the bio_cgroup, associated with > the task and store this info in page->page_cgroup->bio_cgroup. > > static inline struct bio_cgroup *bio_cgroup_from_task(struct task_struct > *p) > { > return container_of(task_subsys_state(p, bio_cgroup_subsys_id), > struct bio_cgroup, css); > } > > At any later point, one can look at bio and reach respective bio_cgroup > by. > > bio->page->page_cgroup->bio_cgroup. > > Looks like now we are getting rid of page_cgroup pointer in "struct page" > and we shall have to change the implementation accordingly. Actually, only page_get_page_cgroup() implementation would change. And we don't have to worry about the particular implementation (hash, radix_tree, whatever..), in any case bio-cgroup has to simply use the opportune interface: page_get_page_cgroup(struct *page). -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 16:20 ` Vivek Goyal (?) (?) @ 2008-09-18 19:54 ` Andrea Righi -1 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-18 19:54 UTC (permalink / raw) To: Vivek Goyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, balbir Vivek Goyal wrote: > On Thu, Sep 18, 2008 at 05:18:50PM +0200, Andrea Righi wrote: >> Vivek Goyal wrote: >>> On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: >>>> Vivek Goyal wrote: >>>>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >>>>>> Hi All, >>>>>> >>>>>> I have got excellent results of dm-ioband, that controls the disk I/O >>>>>> bandwidth even when it accepts delayed write requests. >>>>>> >>>>>> In this time, I ran some benchmarks with a high-end storage. The >>>>>> reason was to avoid a performance bottleneck due to mechanical factors >>>>>> such as seek time. >>>>>> >>>>>> You can see the details of the benchmarks at: >>>>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >>>>>> >>>>> Hi Ryo, >>>>> >>>>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break >>>>> the notion of process priority in CFQ because now dm-ioband device will >>>>> hold the bio and issue these to lower layers later based on which bio's >>>>> become ready. Hence actual bio submitting context might be different and >>>>> because cfq derives the io_context from current task, it will be broken. >>>>> >>>>> To mitigate that problem, we probably need to implement Fernando's >>>>> suggestion of putting io_context pointer in bio. >>>>> >>>>> Have you already done something to solve this issue? >>>>> >>>>> Secondly, why do we have to create an additional dm-ioband device for >>>>> every device we want to control using rules. This looks little odd >>>>> atleast to me. Can't we keep it in line with rest of the controllers >>>>> where task grouping takes place using cgroup and rules are specified in >>>>> cgroup itself (The way Andrea Righi does for io-throttling patches)? >>>>> >>>>> To avoid creation of stacking another device (dm-ioband) on top of every >>>>> device we want to subject to rules, I was thinking of maintaining an >>>>> rb-tree per request queue. Requests will first go into this rb-tree upon >>>>> __make_request() and then will filter down to elevator associated with the >>>>> queue (if there is one). This will provide us the control of releasing >>>>> bio's to elevaor based on policies (proportional weight, max bandwidth >>>>> etc) and no need of stacking additional block device. >>>>> >>>>> I am working on some experimental proof of concept patches. It will take >>>>> some time though. >>>>> >>>>> I was thinking of following. >>>>> >>>>> - Adopt the Andrea Righi's style of specifying rules for devices and >>>>> group the tasks using cgroups. >>>>> >>>>> - To begin with, adopt dm-ioband's approach of proportional bandwidth >>>>> controller. It makes sense to me limit the bandwidth usage only in >>>>> case of contention. If there is really a need to limit max bandwidth, >>>>> then probably we can do something to implement additional rules or >>>>> implement some policy switcher where user can decide what kind of >>>>> policies need to be implemented. >>>>> >>>>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every >>>>> request queue which is controlled by some kind of cgroup rules. >>>>> >>>>> It would be good to discuss above approach now whether it makes sense or >>>>> not. I think it is kind of fusion of io-throttling and dm-ioband patches >>>>> with additional idea of doing io-control just above elevator on the request >>>>> queue using an rb-tree. >>>> Thanks Vivek. All sounds reasonable to me and I think this is be the right way >>>> to proceed. >>>> >>>> I'll try to design and implement your rb-tree per request-queue idea into my >>>> io-throttle controller, maybe we can reuse it also for a more generic solution. >>>> Feel free to send me your experimental proof of concept if you want, even if >>>> it's not yet complete, I can review it, test and contribute. >>> Currently I have taken code from bio-cgroup to implement cgroups and to >>> provide functionality to associate a bio to a cgroup. I need this to be >>> able to queue the bio's at right node in the rb-tree and then also to be >>> able to take a decision when is the right time to release few requests. >>> >>> Right now in crude implementation, I am working on making system boot. >>> Once patches are at least in little bit working shape, I will send it to you >>> to have a look. >>> >>> Thanks >>> Vivek >> I wonder... wouldn't be simpler to just use the memory controller >> to retrieve this information starting from struct page? >> >> I mean, following this path (in short, obviously using the appropriate >> interfaces for locking and referencing the different objects): >> >> cgrp = page->page_cgroup->mem_cgroup->css.cgroup >> > > Andrea, > > Ok, you are first retrieving cgroup associated page owner and then > retrieving repsective iothrottle state using that > cgroup, (cgroup_to_iothrottle). I have yet to dive deeper into cgroup Correct. > data structures but does it work if iothrottle and memory controller > are mounted on separate hierarchies? ehm... I've to check. I usually mount all the controllers into the same hierarchy. :P > bio-cgroup guys are also doing similar thing in the sense retrieving > relevant pointer through page and page_cgroup and use that to reach > bio_cgroup strucutre. The difference is that they don't retrieve first > css object of mem_cgroup instead they directly store the pointer of > bio_cgroup in page_cgroup (When page is being charged in memory controller). > > While page is being charged, determine the bio_cgroup, associated with > the task and store this info in page->page_cgroup->bio_cgroup. > > static inline struct bio_cgroup *bio_cgroup_from_task(struct task_struct > *p) > { > return container_of(task_subsys_state(p, bio_cgroup_subsys_id), > struct bio_cgroup, css); > } > > At any later point, one can look at bio and reach respective bio_cgroup > by. > > bio->page->page_cgroup->bio_cgroup. > > Looks like now we are getting rid of page_cgroup pointer in "struct page" > and we shall have to change the implementation accordingly. Actually, only page_get_page_cgroup() implementation would change. And we don't have to worry about the particular implementation (hash, radix_tree, whatever..), in any case bio-cgroup has to simply use the opportune interface: page_get_page_cgroup(struct *page). -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080918162010.GJ20640-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080918162010.GJ20640-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-09-18 19:54 ` Andrea Righi 0 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-18 19:54 UTC (permalink / raw) To: Vivek Goyal Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Vivek Goyal wrote: > On Thu, Sep 18, 2008 at 05:18:50PM +0200, Andrea Righi wrote: >> Vivek Goyal wrote: >>> On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: >>>> Vivek Goyal wrote: >>>>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >>>>>> Hi All, >>>>>> >>>>>> I have got excellent results of dm-ioband, that controls the disk I/O >>>>>> bandwidth even when it accepts delayed write requests. >>>>>> >>>>>> In this time, I ran some benchmarks with a high-end storage. The >>>>>> reason was to avoid a performance bottleneck due to mechanical factors >>>>>> such as seek time. >>>>>> >>>>>> You can see the details of the benchmarks at: >>>>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >>>>>> >>>>> Hi Ryo, >>>>> >>>>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break >>>>> the notion of process priority in CFQ because now dm-ioband device will >>>>> hold the bio and issue these to lower layers later based on which bio's >>>>> become ready. Hence actual bio submitting context might be different and >>>>> because cfq derives the io_context from current task, it will be broken. >>>>> >>>>> To mitigate that problem, we probably need to implement Fernando's >>>>> suggestion of putting io_context pointer in bio. >>>>> >>>>> Have you already done something to solve this issue? >>>>> >>>>> Secondly, why do we have to create an additional dm-ioband device for >>>>> every device we want to control using rules. This looks little odd >>>>> atleast to me. Can't we keep it in line with rest of the controllers >>>>> where task grouping takes place using cgroup and rules are specified in >>>>> cgroup itself (The way Andrea Righi does for io-throttling patches)? >>>>> >>>>> To avoid creation of stacking another device (dm-ioband) on top of every >>>>> device we want to subject to rules, I was thinking of maintaining an >>>>> rb-tree per request queue. Requests will first go into this rb-tree upon >>>>> __make_request() and then will filter down to elevator associated with the >>>>> queue (if there is one). This will provide us the control of releasing >>>>> bio's to elevaor based on policies (proportional weight, max bandwidth >>>>> etc) and no need of stacking additional block device. >>>>> >>>>> I am working on some experimental proof of concept patches. It will take >>>>> some time though. >>>>> >>>>> I was thinking of following. >>>>> >>>>> - Adopt the Andrea Righi's style of specifying rules for devices and >>>>> group the tasks using cgroups. >>>>> >>>>> - To begin with, adopt dm-ioband's approach of proportional bandwidth >>>>> controller. It makes sense to me limit the bandwidth usage only in >>>>> case of contention. If there is really a need to limit max bandwidth, >>>>> then probably we can do something to implement additional rules or >>>>> implement some policy switcher where user can decide what kind of >>>>> policies need to be implemented. >>>>> >>>>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every >>>>> request queue which is controlled by some kind of cgroup rules. >>>>> >>>>> It would be good to discuss above approach now whether it makes sense or >>>>> not. I think it is kind of fusion of io-throttling and dm-ioband patches >>>>> with additional idea of doing io-control just above elevator on the request >>>>> queue using an rb-tree. >>>> Thanks Vivek. All sounds reasonable to me and I think this is be the right way >>>> to proceed. >>>> >>>> I'll try to design and implement your rb-tree per request-queue idea into my >>>> io-throttle controller, maybe we can reuse it also for a more generic solution. >>>> Feel free to send me your experimental proof of concept if you want, even if >>>> it's not yet complete, I can review it, test and contribute. >>> Currently I have taken code from bio-cgroup to implement cgroups and to >>> provide functionality to associate a bio to a cgroup. I need this to be >>> able to queue the bio's at right node in the rb-tree and then also to be >>> able to take a decision when is the right time to release few requests. >>> >>> Right now in crude implementation, I am working on making system boot. >>> Once patches are at least in little bit working shape, I will send it to you >>> to have a look. >>> >>> Thanks >>> Vivek >> I wonder... wouldn't be simpler to just use the memory controller >> to retrieve this information starting from struct page? >> >> I mean, following this path (in short, obviously using the appropriate >> interfaces for locking and referencing the different objects): >> >> cgrp = page->page_cgroup->mem_cgroup->css.cgroup >> > > Andrea, > > Ok, you are first retrieving cgroup associated page owner and then > retrieving repsective iothrottle state using that > cgroup, (cgroup_to_iothrottle). I have yet to dive deeper into cgroup Correct. > data structures but does it work if iothrottle and memory controller > are mounted on separate hierarchies? ehm... I've to check. I usually mount all the controllers into the same hierarchy. :P > bio-cgroup guys are also doing similar thing in the sense retrieving > relevant pointer through page and page_cgroup and use that to reach > bio_cgroup strucutre. The difference is that they don't retrieve first > css object of mem_cgroup instead they directly store the pointer of > bio_cgroup in page_cgroup (When page is being charged in memory controller). > > While page is being charged, determine the bio_cgroup, associated with > the task and store this info in page->page_cgroup->bio_cgroup. > > static inline struct bio_cgroup *bio_cgroup_from_task(struct task_struct > *p) > { > return container_of(task_subsys_state(p, bio_cgroup_subsys_id), > struct bio_cgroup, css); > } > > At any later point, one can look at bio and reach respective bio_cgroup > by. > > bio->page->page_cgroup->bio_cgroup. > > Looks like now we are getting rid of page_cgroup pointer in "struct page" > and we shall have to change the implementation accordingly. Actually, only page_get_page_cgroup() implementation would change. And we don't have to worry about the particular implementation (hash, radix_tree, whatever..), in any case bio-cgroup has to simply use the opportune interface: page_get_page_cgroup(struct *page). -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 15:18 ` Andrea Righi @ 2008-09-19 3:34 ` Hirokazu Takahashi -1 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-19 3:34 UTC (permalink / raw) To: righi.andrea, dm-devel Cc: xen-devel, containers, agk, linux-kernel, virtualization, jens.axboe, ryov, balbir, fernando, vgoyal, xemul Hi, > >> Vivek Goyal wrote: > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > >>>> Hi All, > >>>> > >>>> I have got excellent results of dm-ioband, that controls the disk I/O > >>>> bandwidth even when it accepts delayed write requests. > >>>> > >>>> In this time, I ran some benchmarks with a high-end storage. The > >>>> reason was to avoid a performance bottleneck due to mechanical factors > >>>> such as seek time. > >>>> > >>>> You can see the details of the benchmarks at: > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > >>>> > >>> Hi Ryo, > >>> > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > >>> the notion of process priority in CFQ because now dm-ioband device will > >>> hold the bio and issue these to lower layers later based on which bio's > >>> become ready. Hence actual bio submitting context might be different and > >>> because cfq derives the io_context from current task, it will be broken. > >>> > >>> To mitigate that problem, we probably need to implement Fernando's > >>> suggestion of putting io_context pointer in bio. > >>> > >>> Have you already done something to solve this issue? > >>> > >>> Secondly, why do we have to create an additional dm-ioband device for > >>> every device we want to control using rules. This looks little odd > >>> atleast to me. Can't we keep it in line with rest of the controllers > >>> where task grouping takes place using cgroup and rules are specified in > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? > >>> > >>> To avoid creation of stacking another device (dm-ioband) on top of every > >>> device we want to subject to rules, I was thinking of maintaining an > >>> rb-tree per request queue. Requests will first go into this rb-tree upon > >>> __make_request() and then will filter down to elevator associated with the > >>> queue (if there is one). This will provide us the control of releasing > >>> bio's to elevaor based on policies (proportional weight, max bandwidth > >>> etc) and no need of stacking additional block device. > >>> > >>> I am working on some experimental proof of concept patches. It will take > >>> some time though. > >>> > >>> I was thinking of following. > >>> > >>> - Adopt the Andrea Righi's style of specifying rules for devices and > >>> group the tasks using cgroups. > >>> > >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth > >>> controller. It makes sense to me limit the bandwidth usage only in > >>> case of contention. If there is really a need to limit max bandwidth, > >>> then probably we can do something to implement additional rules or > >>> implement some policy switcher where user can decide what kind of > >>> policies need to be implemented. > >>> > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > >>> request queue which is controlled by some kind of cgroup rules. > >>> > >>> It would be good to discuss above approach now whether it makes sense or > >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches > >>> with additional idea of doing io-control just above elevator on the request > >>> queue using an rb-tree. > >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way > >> to proceed. > >> > >> I'll try to design and implement your rb-tree per request-queue idea into my > >> io-throttle controller, maybe we can reuse it also for a more generic solution. > >> Feel free to send me your experimental proof of concept if you want, even if > >> it's not yet complete, I can review it, test and contribute. > > > > Currently I have taken code from bio-cgroup to implement cgroups and to > > provide functionality to associate a bio to a cgroup. I need this to be > > able to queue the bio's at right node in the rb-tree and then also to be > > able to take a decision when is the right time to release few requests. > > > > Right now in crude implementation, I am working on making system boot. > > Once patches are at least in little bit working shape, I will send it to you > > to have a look. > > > > Thanks > > Vivek > > I wonder... wouldn't be simpler to just use the memory controller > to retrieve this information starting from struct page? > > I mean, following this path (in short, obviously using the appropriate > interfaces for locking and referencing the different objects): > > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > > Once you get the cgrp it's very easy to use the corresponding controller > structure. > > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio > to a cgroup. What other functionalities/advantages bio-cgroup provide in > addition to that? I've decided to get Ryo to post the accurate dirty-page tracking patch for bio-cgroup, which isn't perfect yet though. The memory controller never wants to support this tracking because migrating a page between memory cgroups is really heavy. I also thought enhancing the memory controller would be good enough, but a lot of people said they wanted to control memory resource and block I/O resource separately. So you can create several bio-cgroup in one memory-cgroup, or you can use bio-cgroup without memory-cgroup. I also have a plan to implement more acurate tracking mechanism on bio-cgroup after the memory cgroup team re-implement the infrastructure, which won't be supported by memory-cgroup. When a process are moved into another memory cgroup, the pages belonging to the process don't move to the new cgroup because migrating pages is so heavy. It's hard to find the pages from the process and migrating pages may cause some memory pressure. I'll implement this feature only on bio-cgroup with minimum overhead Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-19 3:34 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-19 3:34 UTC (permalink / raw) To: righi.andrea, dm-devel Cc: vgoyal, ryov, xen-devel, containers, jens.axboe, linux-kernel, virtualization, agk, xemul, fernando, balbir Hi, > >> Vivek Goyal wrote: > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > >>>> Hi All, > >>>> > >>>> I have got excellent results of dm-ioband, that controls the disk I/O > >>>> bandwidth even when it accepts delayed write requests. > >>>> > >>>> In this time, I ran some benchmarks with a high-end storage. The > >>>> reason was to avoid a performance bottleneck due to mechanical factors > >>>> such as seek time. > >>>> > >>>> You can see the details of the benchmarks at: > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > >>>> > >>> Hi Ryo, > >>> > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > >>> the notion of process priority in CFQ because now dm-ioband device will > >>> hold the bio and issue these to lower layers later based on which bio's > >>> become ready. Hence actual bio submitting context might be different and > >>> because cfq derives the io_context from current task, it will be broken. > >>> > >>> To mitigate that problem, we probably need to implement Fernando's > >>> suggestion of putting io_context pointer in bio. > >>> > >>> Have you already done something to solve this issue? > >>> > >>> Secondly, why do we have to create an additional dm-ioband device for > >>> every device we want to control using rules. This looks little odd > >>> atleast to me. Can't we keep it in line with rest of the controllers > >>> where task grouping takes place using cgroup and rules are specified in > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? > >>> > >>> To avoid creation of stacking another device (dm-ioband) on top of every > >>> device we want to subject to rules, I was thinking of maintaining an > >>> rb-tree per request queue. Requests will first go into this rb-tree upon > >>> __make_request() and then will filter down to elevator associated with the > >>> queue (if there is one). This will provide us the control of releasing > >>> bio's to elevaor based on policies (proportional weight, max bandwidth > >>> etc) and no need of stacking additional block device. > >>> > >>> I am working on some experimental proof of concept patches. It will take > >>> some time though. > >>> > >>> I was thinking of following. > >>> > >>> - Adopt the Andrea Righi's style of specifying rules for devices and > >>> group the tasks using cgroups. > >>> > >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth > >>> controller. It makes sense to me limit the bandwidth usage only in > >>> case of contention. If there is really a need to limit max bandwidth, > >>> then probably we can do something to implement additional rules or > >>> implement some policy switcher where user can decide what kind of > >>> policies need to be implemented. > >>> > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > >>> request queue which is controlled by some kind of cgroup rules. > >>> > >>> It would be good to discuss above approach now whether it makes sense or > >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches > >>> with additional idea of doing io-control just above elevator on the request > >>> queue using an rb-tree. > >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way > >> to proceed. > >> > >> I'll try to design and implement your rb-tree per request-queue idea into my > >> io-throttle controller, maybe we can reuse it also for a more generic solution. > >> Feel free to send me your experimental proof of concept if you want, even if > >> it's not yet complete, I can review it, test and contribute. > > > > Currently I have taken code from bio-cgroup to implement cgroups and to > > provide functionality to associate a bio to a cgroup. I need this to be > > able to queue the bio's at right node in the rb-tree and then also to be > > able to take a decision when is the right time to release few requests. > > > > Right now in crude implementation, I am working on making system boot. > > Once patches are at least in little bit working shape, I will send it to you > > to have a look. > > > > Thanks > > Vivek > > I wonder... wouldn't be simpler to just use the memory controller > to retrieve this information starting from struct page? > > I mean, following this path (in short, obviously using the appropriate > interfaces for locking and referencing the different objects): > > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > > Once you get the cgrp it's very easy to use the corresponding controller > structure. > > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio > to a cgroup. What other functionalities/advantages bio-cgroup provide in > addition to that? I've decided to get Ryo to post the accurate dirty-page tracking patch for bio-cgroup, which isn't perfect yet though. The memory controller never wants to support this tracking because migrating a page between memory cgroups is really heavy. I also thought enhancing the memory controller would be good enough, but a lot of people said they wanted to control memory resource and block I/O resource separately. So you can create several bio-cgroup in one memory-cgroup, or you can use bio-cgroup without memory-cgroup. I also have a plan to implement more acurate tracking mechanism on bio-cgroup after the memory cgroup team re-implement the infrastructure, which won't be supported by memory-cgroup. When a process are moved into another memory cgroup, the pages belonging to the process don't move to the new cgroup because migrating pages is so heavy. It's hard to find the pages from the process and migrating pages may cause some memory pressure. I'll implement this feature only on bio-cgroup with minimum overhead Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 3:34 ` Hirokazu Takahashi (?) @ 2008-09-20 4:27 ` KAMEZAWA Hiroyuki 2008-09-20 5:18 ` Balbir Singh ` (2 more replies) -1 siblings, 3 replies; 140+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-09-20 4:27 UTC (permalink / raw) To: Hirokazu Takahashi Cc: righi.andrea, dm-devel, xen-devel, containers, agk, linux-kernel, virtualization, jens.axboe, balbir, fernando, xemul On Fri, 19 Sep 2008 12:34:05 +0900 (JST) Hirokazu Takahashi <taka@valinux.co.jp> wrote: > I've decided to get Ryo to post the accurate dirty-page tracking patch > for bio-cgroup, which isn't perfect yet though. The memory controller > never wants to support this tracking because migrating a page between > memory cgroups is really heavy. > > I also thought enhancing the memory controller would be good enough, > but a lot of people said they wanted to control memory resource and > block I/O resource separately. > So you can create several bio-cgroup in one memory-cgroup, > or you can use bio-cgroup without memory-cgroup. > > I also have a plan to implement more acurate tracking mechanism > on bio-cgroup after the memory cgroup team re-implement the infrastructure, > which won't be supported by memory-cgroup. > When a process are moved into another memory cgroup, > the pages belonging to the process don't move to the new cgroup > because migrating pages is so heavy. It's hard to find the pages > from the process and migrating pages may cause some memory pressure. > I'll implement this feature only on bio-cgroup with minimum overhead > I really would like to move page_cgroup to new cgroup when the process moves... But it's just in my plan and I'm not sure I can do it or not. Anyway what's next for me is 1. fix current discussion to remove page->page_cgroup pointer. 2. reduce locks. 3. support swap and swap-cache. I think algorithm for (1), (2) is now getting smart. Thanks, -Kame ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-20 4:27 ` KAMEZAWA Hiroyuki @ 2008-09-20 5:18 ` Balbir Singh [not found] ` <20080920132703.e74c8f89.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2008-09-20 5:18 ` Balbir Singh 2 siblings, 0 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-20 5:18 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, Hirokazu Takahashi, dm-devel, agk, xemul, fernando, righi.andrea KAMEZAWA Hiroyuki wrote: > On Fri, 19 Sep 2008 12:34:05 +0900 (JST) > Hirokazu Takahashi <taka@valinux.co.jp> wrote: > >> I've decided to get Ryo to post the accurate dirty-page tracking patch >> for bio-cgroup, which isn't perfect yet though. The memory controller >> never wants to support this tracking because migrating a page between >> memory cgroups is really heavy. >> >> I also thought enhancing the memory controller would be good enough, >> but a lot of people said they wanted to control memory resource and >> block I/O resource separately. >> So you can create several bio-cgroup in one memory-cgroup, >> or you can use bio-cgroup without memory-cgroup. >> >> I also have a plan to implement more acurate tracking mechanism >> on bio-cgroup after the memory cgroup team re-implement the infrastructure, >> which won't be supported by memory-cgroup. >> When a process are moved into another memory cgroup, >> the pages belonging to the process don't move to the new cgroup >> because migrating pages is so heavy. It's hard to find the pages >> from the process and migrating pages may cause some memory pressure. >> I'll implement this feature only on bio-cgroup with minimum overhead >> > I really would like to move page_cgroup to new cgroup when the process moves... > But it's just in my plan and I'm not sure I can do it or not. > Kamezawa-San, I am not dead against it, but I would provide a knob/control point for system administrator to decide if movement is important for applications, then let them do so (like force_empty). > Anyway what's next for me is > 1. fix current discussion to remove page->page_cgroup pointer. > 2. reduce locks. Are you planning on reposting these. I've been trying other approaches at my end 1. Use radix tree per-node per-zone 2. Use radix trees only for 32 bit systems 3. Depend on CONFIG_HAVE_MEMORY_PRESENT and build a sparse data structure and use pre-allocation I've posted (1) and I'll take a look at your patches as well > 3. support swap and swap-cache. > > I think algorithm for (1), (2) is now getting smart. > Yes, it is getting better > Thanks, > -Kame > -- Balbir ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080920132703.e74c8f89.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080920132703.e74c8f89.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2008-09-20 5:18 ` Balbir Singh 0 siblings, 0 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-20 5:18 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w KAMEZAWA Hiroyuki wrote: > On Fri, 19 Sep 2008 12:34:05 +0900 (JST) > Hirokazu Takahashi <taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> wrote: > >> I've decided to get Ryo to post the accurate dirty-page tracking patch >> for bio-cgroup, which isn't perfect yet though. The memory controller >> never wants to support this tracking because migrating a page between >> memory cgroups is really heavy. >> >> I also thought enhancing the memory controller would be good enough, >> but a lot of people said they wanted to control memory resource and >> block I/O resource separately. >> So you can create several bio-cgroup in one memory-cgroup, >> or you can use bio-cgroup without memory-cgroup. >> >> I also have a plan to implement more acurate tracking mechanism >> on bio-cgroup after the memory cgroup team re-implement the infrastructure, >> which won't be supported by memory-cgroup. >> When a process are moved into another memory cgroup, >> the pages belonging to the process don't move to the new cgroup >> because migrating pages is so heavy. It's hard to find the pages >> from the process and migrating pages may cause some memory pressure. >> I'll implement this feature only on bio-cgroup with minimum overhead >> > I really would like to move page_cgroup to new cgroup when the process moves... > But it's just in my plan and I'm not sure I can do it or not. > Kamezawa-San, I am not dead against it, but I would provide a knob/control point for system administrator to decide if movement is important for applications, then let them do so (like force_empty). > Anyway what's next for me is > 1. fix current discussion to remove page->page_cgroup pointer. > 2. reduce locks. Are you planning on reposting these. I've been trying other approaches at my end 1. Use radix tree per-node per-zone 2. Use radix trees only for 32 bit systems 3. Depend on CONFIG_HAVE_MEMORY_PRESENT and build a sparse data structure and use pre-allocation I've posted (1) and I'll take a look at your patches as well > 3. support swap and swap-cache. > > I think algorithm for (1), (2) is now getting smart. > Yes, it is getting better > Thanks, > -Kame > -- Balbir ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-20 4:27 ` KAMEZAWA Hiroyuki 2008-09-20 5:18 ` Balbir Singh [not found] ` <20080920132703.e74c8f89.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2008-09-20 5:18 ` Balbir Singh 2008-09-20 9:25 ` KAMEZAWA Hiroyuki ` (2 more replies) 2 siblings, 3 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-20 5:18 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Hirokazu Takahashi, righi.andrea, dm-devel, xen-devel, containers, agk, linux-kernel, virtualization, jens.axboe, fernando, xemul KAMEZAWA Hiroyuki wrote: > On Fri, 19 Sep 2008 12:34:05 +0900 (JST) > Hirokazu Takahashi <taka@valinux.co.jp> wrote: > >> I've decided to get Ryo to post the accurate dirty-page tracking patch >> for bio-cgroup, which isn't perfect yet though. The memory controller >> never wants to support this tracking because migrating a page between >> memory cgroups is really heavy. >> >> I also thought enhancing the memory controller would be good enough, >> but a lot of people said they wanted to control memory resource and >> block I/O resource separately. >> So you can create several bio-cgroup in one memory-cgroup, >> or you can use bio-cgroup without memory-cgroup. >> >> I also have a plan to implement more acurate tracking mechanism >> on bio-cgroup after the memory cgroup team re-implement the infrastructure, >> which won't be supported by memory-cgroup. >> When a process are moved into another memory cgroup, >> the pages belonging to the process don't move to the new cgroup >> because migrating pages is so heavy. It's hard to find the pages >> from the process and migrating pages may cause some memory pressure. >> I'll implement this feature only on bio-cgroup with minimum overhead >> > I really would like to move page_cgroup to new cgroup when the process moves... > But it's just in my plan and I'm not sure I can do it or not. > Kamezawa-San, I am not dead against it, but I would provide a knob/control point for system administrator to decide if movement is important for applications, then let them do so (like force_empty). > Anyway what's next for me is > 1. fix current discussion to remove page->page_cgroup pointer. > 2. reduce locks. Are you planning on reposting these. I've been trying other approaches at my end 1. Use radix tree per-node per-zone 2. Use radix trees only for 32 bit systems 3. Depend on CONFIG_HAVE_MEMORY_PRESENT and build a sparse data structure and use pre-allocation I've posted (1) and I'll take a look at your patches as well > 3. support swap and swap-cache. > > I think algorithm for (1), (2) is now getting smart. > Yes, it is getting better > Thanks, > -Kame > -- Balbir ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-20 5:18 ` Balbir Singh @ 2008-09-20 9:25 ` KAMEZAWA Hiroyuki [not found] ` <48D48789.8000606-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2008-09-20 9:25 ` KAMEZAWA Hiroyuki 2 siblings, 0 replies; 140+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-09-20 9:25 UTC (permalink / raw) To: balbir Cc: Hirokazu Takahashi, righi.andrea, dm-devel, xen-devel, containers, agk, linux-kernel, virtualization, jens.axboe, fernando, xemul On Fri, 19 Sep 2008 22:18:01 -0700 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Kamezawa-San, I am not dead against it, but I would provide a knob/control point > for system administrator to decide if movement is important for applications, > then let them do so (like force_empty). > make sense. > > Anyway what's next for me is > > 1. fix current discussion to remove page->page_cgroup pointer. > > 2. reduce locks. > > Are you planning on reposting these. I've been trying other approaches at my end > I'll post in next Monday. It's obvious that I should do more tests/fixes... About performance, I'll give it up at some reasonable point. > 1. Use radix tree per-node per-zone > 2. Use radix trees only for 32 bit systems > 3. Depend on CONFIG_HAVE_MEMORY_PRESENT and build a sparse data structure and > use pre-allocation > > I've posted (1) and I'll take a look at your patches as well > My patch has (many) bugs. Severals are fixed but there will be still ;) SwapCache beats me again because it easily reuse uncharged pages... BTW why do you like radix-tree ? It's not very good for our purpose. FLATMEM support for small system will be easy work. > > 3. support swap and swap-cache. > > > > I think algorithm for (1), (2) is now getting smart. > > > > Yes, it is getting better > Thanks, -Kame ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <48D48789.8000606-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks [not found] ` <48D48789.8000606-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2008-09-20 9:25 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 140+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-09-20 9:25 UTC (permalink / raw) To: balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w On Fri, 19 Sep 2008 22:18:01 -0700 Balbir Singh <balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote: > Kamezawa-San, I am not dead against it, but I would provide a knob/control point > for system administrator to decide if movement is important for applications, > then let them do so (like force_empty). > make sense. > > Anyway what's next for me is > > 1. fix current discussion to remove page->page_cgroup pointer. > > 2. reduce locks. > > Are you planning on reposting these. I've been trying other approaches at my end > I'll post in next Monday. It's obvious that I should do more tests/fixes... About performance, I'll give it up at some reasonable point. > 1. Use radix tree per-node per-zone > 2. Use radix trees only for 32 bit systems > 3. Depend on CONFIG_HAVE_MEMORY_PRESENT and build a sparse data structure and > use pre-allocation > > I've posted (1) and I'll take a look at your patches as well > My patch has (many) bugs. Severals are fixed but there will be still ;) SwapCache beats me again because it easily reuse uncharged pages... BTW why do you like radix-tree ? It's not very good for our purpose. FLATMEM support for small system will be easy work. > > 3. support swap and swap-cache. > > > > I think algorithm for (1), (2) is now getting smart. > > > > Yes, it is getting better > Thanks, -Kame ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-20 5:18 ` Balbir Singh 2008-09-20 9:25 ` KAMEZAWA Hiroyuki [not found] ` <48D48789.8000606-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2008-09-20 9:25 ` KAMEZAWA Hiroyuki 2 siblings, 0 replies; 140+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-09-20 9:25 UTC (permalink / raw) To: balbir Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, Hirokazu Takahashi, dm-devel, agk, xemul, fernando, righi.andrea On Fri, 19 Sep 2008 22:18:01 -0700 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Kamezawa-San, I am not dead against it, but I would provide a knob/control point > for system administrator to decide if movement is important for applications, > then let them do so (like force_empty). > make sense. > > Anyway what's next for me is > > 1. fix current discussion to remove page->page_cgroup pointer. > > 2. reduce locks. > > Are you planning on reposting these. I've been trying other approaches at my end > I'll post in next Monday. It's obvious that I should do more tests/fixes... About performance, I'll give it up at some reasonable point. > 1. Use radix tree per-node per-zone > 2. Use radix trees only for 32 bit systems > 3. Depend on CONFIG_HAVE_MEMORY_PRESENT and build a sparse data structure and > use pre-allocation > > I've posted (1) and I'll take a look at your patches as well > My patch has (many) bugs. Severals are fixed but there will be still ;) SwapCache beats me again because it easily reuse uncharged pages... BTW why do you like radix-tree ? It's not very good for our purpose. FLATMEM support for small system will be easy work. > > 3. support swap and swap-cache. > > > > I think algorithm for (1), (2) is now getting smart. > > > > Yes, it is getting better > Thanks, -Kame ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 3:34 ` Hirokazu Takahashi (?) (?) @ 2008-09-20 4:27 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 140+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-09-20 4:27 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, xemul, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, balbir, fernando, righi.andrea On Fri, 19 Sep 2008 12:34:05 +0900 (JST) Hirokazu Takahashi <taka@valinux.co.jp> wrote: > I've decided to get Ryo to post the accurate dirty-page tracking patch > for bio-cgroup, which isn't perfect yet though. The memory controller > never wants to support this tracking because migrating a page between > memory cgroups is really heavy. > > I also thought enhancing the memory controller would be good enough, > but a lot of people said they wanted to control memory resource and > block I/O resource separately. > So you can create several bio-cgroup in one memory-cgroup, > or you can use bio-cgroup without memory-cgroup. > > I also have a plan to implement more acurate tracking mechanism > on bio-cgroup after the memory cgroup team re-implement the infrastructure, > which won't be supported by memory-cgroup. > When a process are moved into another memory cgroup, > the pages belonging to the process don't move to the new cgroup > because migrating pages is so heavy. It's hard to find the pages > from the process and migrating pages may cause some memory pressure. > I'll implement this feature only on bio-cgroup with minimum overhead > I really would like to move page_cgroup to new cgroup when the process moves... But it's just in my plan and I'm not sure I can do it or not. Anyway what's next for me is 1. fix current discussion to remove page->page_cgroup pointer. 2. reduce locks. 3. support swap and swap-cache. I think algorithm for (1), (2) is now getting smart. Thanks, -Kame ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [Xen-devel] Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 3:34 ` Hirokazu Takahashi ` (2 preceding siblings ...) (?) @ 2008-09-24 11:04 ` Balbir Singh -1 siblings, 0 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-24 11:04 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, vgoyal, righi.andrea [-- Attachment #1.1: Type: text/plain, Size: 6827 bytes --] On Fri, Sep 19, 2008 at 9:04 AM, Hirokazu Takahashi <taka@valinux.co.jp>wrote: > Hi, > > > >> Vivek Goyal wrote: > > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > > >>>> Hi All, > > >>>> > > >>>> I have got excellent results of dm-ioband, that controls the disk > I/O > > >>>> bandwidth even when it accepts delayed write requests. > > >>>> > > >>>> In this time, I ran some benchmarks with a high-end storage. The > > >>>> reason was to avoid a performance bottleneck due to mechanical > factors > > >>>> such as seek time. > > >>>> > > >>>> You can see the details of the benchmarks at: > > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/<http://people.valinux.co.jp/%7Eryov/dm-ioband/hps/> > > >>>> > > >>> Hi Ryo, > > >>> > > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will > break > > >>> the notion of process priority in CFQ because now dm-ioband device > will > > >>> hold the bio and issue these to lower layers later based on which > bio's > > >>> become ready. Hence actual bio submitting context might be different > and > > >>> because cfq derives the io_context from current task, it will be > broken. > > >>> > > >>> To mitigate that problem, we probably need to implement Fernando's > > >>> suggestion of putting io_context pointer in bio. > > >>> > > >>> Have you already done something to solve this issue? > > >>> > > >>> Secondly, why do we have to create an additional dm-ioband device for > > >>> every device we want to control using rules. This looks little odd > > >>> atleast to me. Can't we keep it in line with rest of the controllers > > >>> where task grouping takes place using cgroup and rules are specified > in > > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? > > >>> > > >>> To avoid creation of stacking another device (dm-ioband) on top of > every > > >>> device we want to subject to rules, I was thinking of maintaining an > > >>> rb-tree per request queue. Requests will first go into this rb-tree > upon > > >>> __make_request() and then will filter down to elevator associated > with the > > >>> queue (if there is one). This will provide us the control of > releasing > > >>> bio's to elevaor based on policies (proportional weight, max > bandwidth > > >>> etc) and no need of stacking additional block device. > > >>> > > >>> I am working on some experimental proof of concept patches. It will > take > > >>> some time though. > > >>> > > >>> I was thinking of following. > > >>> > > >>> - Adopt the Andrea Righi's style of specifying rules for devices and > > >>> group the tasks using cgroups. > > >>> > > >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth > > >>> controller. It makes sense to me limit the bandwidth usage only in > > >>> case of contention. If there is really a need to limit max > bandwidth, > > >>> then probably we can do something to implement additional rules or > > >>> implement some policy switcher where user can decide what kind of > > >>> policies need to be implemented. > > >>> > > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on > every > > >>> request queue which is controlled by some kind of cgroup rules. > > >>> > > >>> It would be good to discuss above approach now whether it makes sense > or > > >>> not. I think it is kind of fusion of io-throttling and dm-ioband > patches > > >>> with additional idea of doing io-control just above elevator on the > request > > >>> queue using an rb-tree. > > >> Thanks Vivek. All sounds reasonable to me and I think this is be the > right way > > >> to proceed. > > >> > > >> I'll try to design and implement your rb-tree per request-queue idea > into my > > >> io-throttle controller, maybe we can reuse it also for a more generic > solution. > > >> Feel free to send me your experimental proof of concept if you want, > even if > > >> it's not yet complete, I can review it, test and contribute. > > > > > > Currently I have taken code from bio-cgroup to implement cgroups and to > > > provide functionality to associate a bio to a cgroup. I need this to be > > > able to queue the bio's at right node in the rb-tree and then also to > be > > > able to take a decision when is the right time to release few requests. > > > > > > Right now in crude implementation, I am working on making system boot. > > > Once patches are at least in little bit working shape, I will send it > to you > > > to have a look. > > > > > > Thanks > > > Vivek > > > > I wonder... wouldn't be simpler to just use the memory controller > > to retrieve this information starting from struct page? > > > > I mean, following this path (in short, obviously using the appropriate > > interfaces for locking and referencing the different objects): > > > > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > > > > Once you get the cgrp it's very easy to use the corresponding controller > > structure. > > > > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio > > to a cgroup. What other functionalities/advantages bio-cgroup provide in > > addition to that? > > I've decided to get Ryo to post the accurate dirty-page tracking patch > for bio-cgroup, which isn't perfect yet though. The memory controller > never wants to support this tracking because migrating a page between > memory cgroups is really heavy. > It depends on the migration. The cost is proportional to the number of pages moved. The cost can be brought down (I do have a design on paper -- from long long ago), where moving mm's will reduce the cost of migration, but it adds an additional dereference in the common path. > > I also thought enhancing the memory controller would be good enough, > but a lot of people said they wanted to control memory resource and > block I/O resource separately. Yes, ideally we do want that. > > So you can create several bio-cgroup in one memory-cgroup, > or you can use bio-cgroup without memory-cgroup. > > I also have a plan to implement more acurate tracking mechanism > on bio-cgroup after the memory cgroup team re-implement the infrastructure, > which won't be supported by memory-cgroup. > When a process are moved into another memory cgroup, > the pages belonging to the process don't move to the new cgroup > because migrating pages is so heavy. It's hard to find the pages > from the process and migrating pages may cause some memory pressure. > I'll implement this feature only on bio-cgroup with minimum overhead > Kamezawa has also wanted the page migration feature and we've agreed to provide a per-cgroup flag to decide to turn migration on/off. I would not mind refactoring memcontrol.c if that can help the IO controller and if you want migration, force the migration flag to on and warn the user if they try to turn it off. Balbir [-- Attachment #1.2: Type: text/html, Size: 8820 bytes --] [-- Attachment #2: Type: text/plain, Size: 184 bytes --] _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [Xen-devel] Re: Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 3:34 ` Hirokazu Takahashi ` (3 preceding siblings ...) (?) @ 2008-09-24 11:04 ` Balbir Singh [not found] ` <661de9470809240404i62300942o15337ecec335fe22-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> ` (2 more replies) -1 siblings, 3 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-24 11:04 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, vgoyal, righi.andrea [-- Attachment #1.1: Type: text/plain, Size: 6827 bytes --] On Fri, Sep 19, 2008 at 9:04 AM, Hirokazu Takahashi <taka@valinux.co.jp>wrote: > Hi, > > > >> Vivek Goyal wrote: > > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > > >>>> Hi All, > > >>>> > > >>>> I have got excellent results of dm-ioband, that controls the disk > I/O > > >>>> bandwidth even when it accepts delayed write requests. > > >>>> > > >>>> In this time, I ran some benchmarks with a high-end storage. The > > >>>> reason was to avoid a performance bottleneck due to mechanical > factors > > >>>> such as seek time. > > >>>> > > >>>> You can see the details of the benchmarks at: > > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/<http://people.valinux.co.jp/%7Eryov/dm-ioband/hps/> > > >>>> > > >>> Hi Ryo, > > >>> > > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will > break > > >>> the notion of process priority in CFQ because now dm-ioband device > will > > >>> hold the bio and issue these to lower layers later based on which > bio's > > >>> become ready. Hence actual bio submitting context might be different > and > > >>> because cfq derives the io_context from current task, it will be > broken. > > >>> > > >>> To mitigate that problem, we probably need to implement Fernando's > > >>> suggestion of putting io_context pointer in bio. > > >>> > > >>> Have you already done something to solve this issue? > > >>> > > >>> Secondly, why do we have to create an additional dm-ioband device for > > >>> every device we want to control using rules. This looks little odd > > >>> atleast to me. Can't we keep it in line with rest of the controllers > > >>> where task grouping takes place using cgroup and rules are specified > in > > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? > > >>> > > >>> To avoid creation of stacking another device (dm-ioband) on top of > every > > >>> device we want to subject to rules, I was thinking of maintaining an > > >>> rb-tree per request queue. Requests will first go into this rb-tree > upon > > >>> __make_request() and then will filter down to elevator associated > with the > > >>> queue (if there is one). This will provide us the control of > releasing > > >>> bio's to elevaor based on policies (proportional weight, max > bandwidth > > >>> etc) and no need of stacking additional block device. > > >>> > > >>> I am working on some experimental proof of concept patches. It will > take > > >>> some time though. > > >>> > > >>> I was thinking of following. > > >>> > > >>> - Adopt the Andrea Righi's style of specifying rules for devices and > > >>> group the tasks using cgroups. > > >>> > > >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth > > >>> controller. It makes sense to me limit the bandwidth usage only in > > >>> case of contention. If there is really a need to limit max > bandwidth, > > >>> then probably we can do something to implement additional rules or > > >>> implement some policy switcher where user can decide what kind of > > >>> policies need to be implemented. > > >>> > > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on > every > > >>> request queue which is controlled by some kind of cgroup rules. > > >>> > > >>> It would be good to discuss above approach now whether it makes sense > or > > >>> not. I think it is kind of fusion of io-throttling and dm-ioband > patches > > >>> with additional idea of doing io-control just above elevator on the > request > > >>> queue using an rb-tree. > > >> Thanks Vivek. All sounds reasonable to me and I think this is be the > right way > > >> to proceed. > > >> > > >> I'll try to design and implement your rb-tree per request-queue idea > into my > > >> io-throttle controller, maybe we can reuse it also for a more generic > solution. > > >> Feel free to send me your experimental proof of concept if you want, > even if > > >> it's not yet complete, I can review it, test and contribute. > > > > > > Currently I have taken code from bio-cgroup to implement cgroups and to > > > provide functionality to associate a bio to a cgroup. I need this to be > > > able to queue the bio's at right node in the rb-tree and then also to > be > > > able to take a decision when is the right time to release few requests. > > > > > > Right now in crude implementation, I am working on making system boot. > > > Once patches are at least in little bit working shape, I will send it > to you > > > to have a look. > > > > > > Thanks > > > Vivek > > > > I wonder... wouldn't be simpler to just use the memory controller > > to retrieve this information starting from struct page? > > > > I mean, following this path (in short, obviously using the appropriate > > interfaces for locking and referencing the different objects): > > > > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > > > > Once you get the cgrp it's very easy to use the corresponding controller > > structure. > > > > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio > > to a cgroup. What other functionalities/advantages bio-cgroup provide in > > addition to that? > > I've decided to get Ryo to post the accurate dirty-page tracking patch > for bio-cgroup, which isn't perfect yet though. The memory controller > never wants to support this tracking because migrating a page between > memory cgroups is really heavy. > It depends on the migration. The cost is proportional to the number of pages moved. The cost can be brought down (I do have a design on paper -- from long long ago), where moving mm's will reduce the cost of migration, but it adds an additional dereference in the common path. > > I also thought enhancing the memory controller would be good enough, > but a lot of people said they wanted to control memory resource and > block I/O resource separately. Yes, ideally we do want that. > > So you can create several bio-cgroup in one memory-cgroup, > or you can use bio-cgroup without memory-cgroup. > > I also have a plan to implement more acurate tracking mechanism > on bio-cgroup after the memory cgroup team re-implement the infrastructure, > which won't be supported by memory-cgroup. > When a process are moved into another memory cgroup, > the pages belonging to the process don't move to the new cgroup > because migrating pages is so heavy. It's hard to find the pages > from the process and migrating pages may cause some memory pressure. > I'll implement this feature only on bio-cgroup with minimum overhead > Kamezawa has also wanted the page migration feature and we've agreed to provide a per-cgroup flag to decide to turn migration on/off. I would not mind refactoring memcontrol.c if that can help the IO controller and if you want migration, force the migration flag to on and warn the user if they try to turn it off. Balbir [-- Attachment #1.2: Type: text/html, Size: 8820 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <661de9470809240404i62300942o15337ecec335fe22-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [Xen-devel] Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks [not found] ` <661de9470809240404i62300942o15337ecec335fe22-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2008-09-24 11:07 ` Balbir Singh 0 siblings, 0 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-24 11:07 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w On Wed, Sep 24, 2008 at 4:34 PM, Balbir Singh <balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote: > > > On Fri, Sep 19, 2008 at 9:04 AM, Hirokazu Takahashi <taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> > wrote: >> >> Hi, >> >> > >> Vivek Goyal wrote: >> > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >> > >>>> Hi All, >> > >>>> >> > >>>> I have got excellent results of dm-ioband, that controls the disk >> > >>>> I/O >> > >>>> bandwidth even when it accepts delayed write requests. >> > >>>> >> > >>>> In this time, I ran some benchmarks with a high-end storage. The >> > >>>> reason was to avoid a performance bottleneck due to mechanical >> > >>>> factors >> > >>>> such as seek time. >> > >>>> >> > >>>> You can see the details of the benchmarks at: >> > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >> > >>>> >> > >>> Hi Ryo, >> > >>> >> > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will >> > >>> break >> > >>> the notion of process priority in CFQ because now dm-ioband device >> > >>> will >> > >>> hold the bio and issue these to lower layers later based on which >> > >>> bio's >> > >>> become ready. Hence actual bio submitting context might be different >> > >>> and >> > >>> because cfq derives the io_context from current task, it will be >> > >>> broken. >> > >>> >> > >>> To mitigate that problem, we probably need to implement Fernando's >> > >>> suggestion of putting io_context pointer in bio. >> > >>> >> > >>> Have you already done something to solve this issue? >> > >>> >> > >>> Secondly, why do we have to create an additional dm-ioband device >> > >>> for >> > >>> every device we want to control using rules. This looks little odd >> > >>> atleast to me. Can't we keep it in line with rest of the controllers >> > >>> where task grouping takes place using cgroup and rules are specified >> > >>> in >> > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? >> > >>> >> > >>> To avoid creation of stacking another device (dm-ioband) on top of >> > >>> every >> > >>> device we want to subject to rules, I was thinking of maintaining an >> > >>> rb-tree per request queue. Requests will first go into this rb-tree >> > >>> upon >> > >>> __make_request() and then will filter down to elevator associated >> > >>> with the >> > >>> queue (if there is one). This will provide us the control of >> > >>> releasing >> > >>> bio's to elevaor based on policies (proportional weight, max >> > >>> bandwidth >> > >>> etc) and no need of stacking additional block device. >> > >>> >> > >>> I am working on some experimental proof of concept patches. It will >> > >>> take >> > >>> some time though. >> > >>> >> > >>> I was thinking of following. >> > >>> >> > >>> - Adopt the Andrea Righi's style of specifying rules for devices and >> > >>> group the tasks using cgroups. >> > >>> >> > >>> - To begin with, adopt dm-ioband's approach of proportional >> > >>> bandwidth >> > >>> controller. It makes sense to me limit the bandwidth usage only in >> > >>> case of contention. If there is really a need to limit max >> > >>> bandwidth, >> > >>> then probably we can do something to implement additional rules or >> > >>> implement some policy switcher where user can decide what kind of >> > >>> policies need to be implemented. >> > >>> >> > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on >> > >>> every >> > >>> request queue which is controlled by some kind of cgroup rules. >> > >>> >> > >>> It would be good to discuss above approach now whether it makes >> > >>> sense or >> > >>> not. I think it is kind of fusion of io-throttling and dm-ioband >> > >>> patches >> > >>> with additional idea of doing io-control just above elevator on the >> > >>> request >> > >>> queue using an rb-tree. >> > >> Thanks Vivek. All sounds reasonable to me and I think this is be the >> > >> right way >> > >> to proceed. >> > >> >> > >> I'll try to design and implement your rb-tree per request-queue idea >> > >> into my >> > >> io-throttle controller, maybe we can reuse it also for a more generic >> > >> solution. >> > >> Feel free to send me your experimental proof of concept if you want, >> > >> even if >> > >> it's not yet complete, I can review it, test and contribute. >> > > >> > > Currently I have taken code from bio-cgroup to implement cgroups and >> > > to >> > > provide functionality to associate a bio to a cgroup. I need this to >> > > be >> > > able to queue the bio's at right node in the rb-tree and then also to >> > > be >> > > able to take a decision when is the right time to release few >> > > requests. >> > > >> > > Right now in crude implementation, I am working on making system boot. >> > > Once patches are at least in little bit working shape, I will send it >> > > to you >> > > to have a look. >> > > >> > > Thanks >> > > Vivek >> > >> > I wonder... wouldn't be simpler to just use the memory controller >> > to retrieve this information starting from struct page? >> > >> > I mean, following this path (in short, obviously using the appropriate >> > interfaces for locking and referencing the different objects): >> > >> > cgrp = page->page_cgroup->mem_cgroup->css.cgroup >> > >> > Once you get the cgrp it's very easy to use the corresponding controller >> > structure. >> > >> > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio >> > to a cgroup. What other functionalities/advantages bio-cgroup provide in >> > addition to that? >> >> I've decided to get Ryo to post the accurate dirty-page tracking patch >> for bio-cgroup, which isn't perfect yet though. The memory controller >> never wants to support this tracking because migrating a page between >> memory cgroups is really heavy. It depends on the migration. The cost is proportional to the number of pages moved. The cost can be brought down (I do have a design on paper -- from long long ago), where moving mm's will reduce the cost of migration, but it adds an additional dereference in the common path. > >> >> I also thought enhancing the memory controller would be good enough, >> but a lot of people said they wanted to control memory resource and >> block I/O resource separately. > > Yes, ideally we do want that. > >> >> So you can create several bio-cgroup in one memory-cgroup, >> or you can use bio-cgroup without memory-cgroup. >> >> I also have a plan to implement more acurate tracking mechanism >> on bio-cgroup after the memory cgroup team re-implement the >> infrastructure, >> which won't be supported by memory-cgroup. >> When a process are moved into another memory cgroup, >> the pages belonging to the process don't move to the new cgroup >> because migrating pages is so heavy. It's hard to find the pages >> from the process and migrating pages may cause some memory pressure. >> I'll implement this feature only on bio-cgroup with minimum overhead > Kamezawa has also wanted the page migration feature and we've agreed to provide a per-cgroup flag to decide to turn migration on/off. I would not mind refactoring memcontrol.c if that can help the IO controller and if you want migration, force the migration flag to on and warn the user if they try to turn it off. Balbir ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 11:04 ` [Xen-devel] " Balbir Singh @ 2008-09-24 11:07 ` Balbir Singh 2008-09-24 11:07 ` [Xen-devel] " Balbir Singh 2008-09-24 11:07 ` Balbir Singh 2 siblings, 0 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-24 11:07 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, ryov, xemul, fernando, vgoyal, righi.andrea On Wed, Sep 24, 2008 at 4:34 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > On Fri, Sep 19, 2008 at 9:04 AM, Hirokazu Takahashi <taka@valinux.co.jp> > wrote: >> >> Hi, >> >> > >> Vivek Goyal wrote: >> > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >> > >>>> Hi All, >> > >>>> >> > >>>> I have got excellent results of dm-ioband, that controls the disk >> > >>>> I/O >> > >>>> bandwidth even when it accepts delayed write requests. >> > >>>> >> > >>>> In this time, I ran some benchmarks with a high-end storage. The >> > >>>> reason was to avoid a performance bottleneck due to mechanical >> > >>>> factors >> > >>>> such as seek time. >> > >>>> >> > >>>> You can see the details of the benchmarks at: >> > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >> > >>>> >> > >>> Hi Ryo, >> > >>> >> > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will >> > >>> break >> > >>> the notion of process priority in CFQ because now dm-ioband device >> > >>> will >> > >>> hold the bio and issue these to lower layers later based on which >> > >>> bio's >> > >>> become ready. Hence actual bio submitting context might be different >> > >>> and >> > >>> because cfq derives the io_context from current task, it will be >> > >>> broken. >> > >>> >> > >>> To mitigate that problem, we probably need to implement Fernando's >> > >>> suggestion of putting io_context pointer in bio. >> > >>> >> > >>> Have you already done something to solve this issue? >> > >>> >> > >>> Secondly, why do we have to create an additional dm-ioband device >> > >>> for >> > >>> every device we want to control using rules. This looks little odd >> > >>> atleast to me. Can't we keep it in line with rest of the controllers >> > >>> where task grouping takes place using cgroup and rules are specified >> > >>> in >> > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? >> > >>> >> > >>> To avoid creation of stacking another device (dm-ioband) on top of >> > >>> every >> > >>> device we want to subject to rules, I was thinking of maintaining an >> > >>> rb-tree per request queue. Requests will first go into this rb-tree >> > >>> upon >> > >>> __make_request() and then will filter down to elevator associated >> > >>> with the >> > >>> queue (if there is one). This will provide us the control of >> > >>> releasing >> > >>> bio's to elevaor based on policies (proportional weight, max >> > >>> bandwidth >> > >>> etc) and no need of stacking additional block device. >> > >>> >> > >>> I am working on some experimental proof of concept patches. It will >> > >>> take >> > >>> some time though. >> > >>> >> > >>> I was thinking of following. >> > >>> >> > >>> - Adopt the Andrea Righi's style of specifying rules for devices and >> > >>> group the tasks using cgroups. >> > >>> >> > >>> - To begin with, adopt dm-ioband's approach of proportional >> > >>> bandwidth >> > >>> controller. It makes sense to me limit the bandwidth usage only in >> > >>> case of contention. If there is really a need to limit max >> > >>> bandwidth, >> > >>> then probably we can do something to implement additional rules or >> > >>> implement some policy switcher where user can decide what kind of >> > >>> policies need to be implemented. >> > >>> >> > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on >> > >>> every >> > >>> request queue which is controlled by some kind of cgroup rules. >> > >>> >> > >>> It would be good to discuss above approach now whether it makes >> > >>> sense or >> > >>> not. I think it is kind of fusion of io-throttling and dm-ioband >> > >>> patches >> > >>> with additional idea of doing io-control just above elevator on the >> > >>> request >> > >>> queue using an rb-tree. >> > >> Thanks Vivek. All sounds reasonable to me and I think this is be the >> > >> right way >> > >> to proceed. >> > >> >> > >> I'll try to design and implement your rb-tree per request-queue idea >> > >> into my >> > >> io-throttle controller, maybe we can reuse it also for a more generic >> > >> solution. >> > >> Feel free to send me your experimental proof of concept if you want, >> > >> even if >> > >> it's not yet complete, I can review it, test and contribute. >> > > >> > > Currently I have taken code from bio-cgroup to implement cgroups and >> > > to >> > > provide functionality to associate a bio to a cgroup. I need this to >> > > be >> > > able to queue the bio's at right node in the rb-tree and then also to >> > > be >> > > able to take a decision when is the right time to release few >> > > requests. >> > > >> > > Right now in crude implementation, I am working on making system boot. >> > > Once patches are at least in little bit working shape, I will send it >> > > to you >> > > to have a look. >> > > >> > > Thanks >> > > Vivek >> > >> > I wonder... wouldn't be simpler to just use the memory controller >> > to retrieve this information starting from struct page? >> > >> > I mean, following this path (in short, obviously using the appropriate >> > interfaces for locking and referencing the different objects): >> > >> > cgrp = page->page_cgroup->mem_cgroup->css.cgroup >> > >> > Once you get the cgrp it's very easy to use the corresponding controller >> > structure. >> > >> > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio >> > to a cgroup. What other functionalities/advantages bio-cgroup provide in >> > addition to that? >> >> I've decided to get Ryo to post the accurate dirty-page tracking patch >> for bio-cgroup, which isn't perfect yet though. The memory controller >> never wants to support this tracking because migrating a page between >> memory cgroups is really heavy. It depends on the migration. The cost is proportional to the number of pages moved. The cost can be brought down (I do have a design on paper -- from long long ago), where moving mm's will reduce the cost of migration, but it adds an additional dereference in the common path. > >> >> I also thought enhancing the memory controller would be good enough, >> but a lot of people said they wanted to control memory resource and >> block I/O resource separately. > > Yes, ideally we do want that. > >> >> So you can create several bio-cgroup in one memory-cgroup, >> or you can use bio-cgroup without memory-cgroup. >> >> I also have a plan to implement more acurate tracking mechanism >> on bio-cgroup after the memory cgroup team re-implement the >> infrastructure, >> which won't be supported by memory-cgroup. >> When a process are moved into another memory cgroup, >> the pages belonging to the process don't move to the new cgroup >> because migrating pages is so heavy. It's hard to find the pages >> from the process and migrating pages may cause some memory pressure. >> I'll implement this feature only on bio-cgroup with minimum overhead > Kamezawa has also wanted the page migration feature and we've agreed to provide a per-cgroup flag to decide to turn migration on/off. I would not mind refactoring memcontrol.c if that can help the IO controller and if you want migration, force the migration flag to on and warn the user if they try to turn it off. Balbir ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [Xen-devel] Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-24 11:07 ` Balbir Singh 0 siblings, 0 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-24 11:07 UTC (permalink / raw) To: Hirokazu Takahashi Cc: righi.andrea, dm-devel, xen-devel, containers, agk, linux-kernel, virtualization, jens.axboe, ryov, fernando, vgoyal, xemul On Wed, Sep 24, 2008 at 4:34 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > On Fri, Sep 19, 2008 at 9:04 AM, Hirokazu Takahashi <taka@valinux.co.jp> > wrote: >> >> Hi, >> >> > >> Vivek Goyal wrote: >> > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >> > >>>> Hi All, >> > >>>> >> > >>>> I have got excellent results of dm-ioband, that controls the disk >> > >>>> I/O >> > >>>> bandwidth even when it accepts delayed write requests. >> > >>>> >> > >>>> In this time, I ran some benchmarks with a high-end storage. The >> > >>>> reason was to avoid a performance bottleneck due to mechanical >> > >>>> factors >> > >>>> such as seek time. >> > >>>> >> > >>>> You can see the details of the benchmarks at: >> > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >> > >>>> >> > >>> Hi Ryo, >> > >>> >> > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will >> > >>> break >> > >>> the notion of process priority in CFQ because now dm-ioband device >> > >>> will >> > >>> hold the bio and issue these to lower layers later based on which >> > >>> bio's >> > >>> become ready. Hence actual bio submitting context might be different >> > >>> and >> > >>> because cfq derives the io_context from current task, it will be >> > >>> broken. >> > >>> >> > >>> To mitigate that problem, we probably need to implement Fernando's >> > >>> suggestion of putting io_context pointer in bio. >> > >>> >> > >>> Have you already done something to solve this issue? >> > >>> >> > >>> Secondly, why do we have to create an additional dm-ioband device >> > >>> for >> > >>> every device we want to control using rules. This looks little odd >> > >>> atleast to me. Can't we keep it in line with rest of the controllers >> > >>> where task grouping takes place using cgroup and rules are specified >> > >>> in >> > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? >> > >>> >> > >>> To avoid creation of stacking another device (dm-ioband) on top of >> > >>> every >> > >>> device we want to subject to rules, I was thinking of maintaining an >> > >>> rb-tree per request queue. Requests will first go into this rb-tree >> > >>> upon >> > >>> __make_request() and then will filter down to elevator associated >> > >>> with the >> > >>> queue (if there is one). This will provide us the control of >> > >>> releasing >> > >>> bio's to elevaor based on policies (proportional weight, max >> > >>> bandwidth >> > >>> etc) and no need of stacking additional block device. >> > >>> >> > >>> I am working on some experimental proof of concept patches. It will >> > >>> take >> > >>> some time though. >> > >>> >> > >>> I was thinking of following. >> > >>> >> > >>> - Adopt the Andrea Righi's style of specifying rules for devices and >> > >>> group the tasks using cgroups. >> > >>> >> > >>> - To begin with, adopt dm-ioband's approach of proportional >> > >>> bandwidth >> > >>> controller. It makes sense to me limit the bandwidth usage only in >> > >>> case of contention. If there is really a need to limit max >> > >>> bandwidth, >> > >>> then probably we can do something to implement additional rules or >> > >>> implement some policy switcher where user can decide what kind of >> > >>> policies need to be implemented. >> > >>> >> > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on >> > >>> every >> > >>> request queue which is controlled by some kind of cgroup rules. >> > >>> >> > >>> It would be good to discuss above approach now whether it makes >> > >>> sense or >> > >>> not. I think it is kind of fusion of io-throttling and dm-ioband >> > >>> patches >> > >>> with additional idea of doing io-control just above elevator on the >> > >>> request >> > >>> queue using an rb-tree. >> > >> Thanks Vivek. All sounds reasonable to me and I think this is be the >> > >> right way >> > >> to proceed. >> > >> >> > >> I'll try to design and implement your rb-tree per request-queue idea >> > >> into my >> > >> io-throttle controller, maybe we can reuse it also for a more generic >> > >> solution. >> > >> Feel free to send me your experimental proof of concept if you want, >> > >> even if >> > >> it's not yet complete, I can review it, test and contribute. >> > > >> > > Currently I have taken code from bio-cgroup to implement cgroups and >> > > to >> > > provide functionality to associate a bio to a cgroup. I need this to >> > > be >> > > able to queue the bio's at right node in the rb-tree and then also to >> > > be >> > > able to take a decision when is the right time to release few >> > > requests. >> > > >> > > Right now in crude implementation, I am working on making system boot. >> > > Once patches are at least in little bit working shape, I will send it >> > > to you >> > > to have a look. >> > > >> > > Thanks >> > > Vivek >> > >> > I wonder... wouldn't be simpler to just use the memory controller >> > to retrieve this information starting from struct page? >> > >> > I mean, following this path (in short, obviously using the appropriate >> > interfaces for locking and referencing the different objects): >> > >> > cgrp = page->page_cgroup->mem_cgroup->css.cgroup >> > >> > Once you get the cgrp it's very easy to use the corresponding controller >> > structure. >> > >> > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio >> > to a cgroup. What other functionalities/advantages bio-cgroup provide in >> > addition to that? >> >> I've decided to get Ryo to post the accurate dirty-page tracking patch >> for bio-cgroup, which isn't perfect yet though. The memory controller >> never wants to support this tracking because migrating a page between >> memory cgroups is really heavy. It depends on the migration. The cost is proportional to the number of pages moved. The cost can be brought down (I do have a design on paper -- from long long ago), where moving mm's will reduce the cost of migration, but it adds an additional dereference in the common path. > >> >> I also thought enhancing the memory controller would be good enough, >> but a lot of people said they wanted to control memory resource and >> block I/O resource separately. > > Yes, ideally we do want that. > >> >> So you can create several bio-cgroup in one memory-cgroup, >> or you can use bio-cgroup without memory-cgroup. >> >> I also have a plan to implement more acurate tracking mechanism >> on bio-cgroup after the memory cgroup team re-implement the >> infrastructure, >> which won't be supported by memory-cgroup. >> When a process are moved into another memory cgroup, >> the pages belonging to the process don't move to the new cgroup >> because migrating pages is so heavy. It's hard to find the pages >> from the process and migrating pages may cause some memory pressure. >> I'll implement this feature only on bio-cgroup with minimum overhead > Kamezawa has also wanted the page migration feature and we've agreed to provide a per-cgroup flag to decide to turn migration on/off. I would not mind refactoring memcontrol.c if that can help the IO controller and if you want migration, force the migration flag to on and warn the user if they try to turn it off. Balbir ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [Xen-devel] Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 11:07 ` [Xen-devel] " Balbir Singh (?) @ 2008-09-26 10:54 ` Hirokazu Takahashi -1 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-26 10:54 UTC (permalink / raw) To: balbir Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, vgoyal, righi.andrea Hi, > >> > > Currently I have taken code from bio-cgroup to implement cgroups and > >> > > to > >> > > provide functionality to associate a bio to a cgroup. I need this to > >> > > be > >> > > able to queue the bio's at right node in the rb-tree and then also to > >> > > be > >> > > able to take a decision when is the right time to release few > >> > > requests. > >> > > > >> > > Right now in crude implementation, I am working on making system boot. > >> > > Once patches are at least in little bit working shape, I will send it > >> > > to you > >> > > to have a look. > >> > > > >> > > Thanks > >> > > Vivek > >> > > >> > I wonder... wouldn't be simpler to just use the memory controller > >> > to retrieve this information starting from struct page? > >> > > >> > I mean, following this path (in short, obviously using the appropriate > >> > interfaces for locking and referencing the different objects): > >> > > >> > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > >> > > >> > Once you get the cgrp it's very easy to use the corresponding controller > >> > structure. > >> > > >> > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio > >> > to a cgroup. What other functionalities/advantages bio-cgroup provide in > >> > addition to that? > >> > >> I've decided to get Ryo to post the accurate dirty-page tracking patch > >> for bio-cgroup, which isn't perfect yet though. The memory controller > >> never wants to support this tracking because migrating a page between > >> memory cgroups is really heavy. > > It depends on the migration. The cost is proportional to the number of > pages moved. The cost can be brought down (I do have a design on > paper -- from long long ago), where moving mm's will reduce the cost > of migration, but it adds an additional dereference in the common > path. Okay, this will help to track anonymous pages even after processes are migrated between memory-cgroups. The rest of my concern is pages in the pagecache, which might be potentially dirtied by processes in other cgroups. I think bio-cgroups should also care this case. > >> I also thought enhancing the memory controller would be good enough, > >> but a lot of people said they wanted to control memory resource and > >> block I/O resource separately. > > > > Yes, ideally we do want that. > > > >> > >> So you can create several bio-cgroup in one memory-cgroup, > >> or you can use bio-cgroup without memory-cgroup. > >> > >> I also have a plan to implement more acurate tracking mechanism > >> on bio-cgroup after the memory cgroup team re-implement the > >> infrastructure, > >> which won't be supported by memory-cgroup. > >> When a process are moved into another memory cgroup, > >> the pages belonging to the process don't move to the new cgroup > >> because migrating pages is so heavy. It's hard to find the pages > >> from the process and migrating pages may cause some memory pressure. > >> I'll implement this feature only on bio-cgroup with minimum overhead > > > > Kamezawa has also wanted the page migration feature and we've agreed > to provide a per-cgroup flag to decide to turn migration on/off. I > would not mind refactoring memcontrol.c if that can help the IO > controller and if you want migration, force the migration flag to on > and warn the user if they try to turn it off. Good news! But I've been wondering whether the IO controller should have the same feature. Once Kamezawa-san finished to implement the new page_cgroup infrastructure which pre-allocates all the memory it needs, I think I can minimize the cost migrating pages between bio-cgroup since this migration won't cause any page reclaim unlike that of memory-cgroup. In this case I might design it only moves pages between bio-cgroups while it won't move them between memory-cgroups. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 11:07 ` [Xen-devel] " Balbir Singh @ 2008-09-26 10:54 ` Hirokazu Takahashi -1 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-26 10:54 UTC (permalink / raw) To: balbir Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, ryov, xemul, fernando, vgoyal, righi.andrea Hi, > >> > > Currently I have taken code from bio-cgroup to implement cgroups and > >> > > to > >> > > provide functionality to associate a bio to a cgroup. I need this to > >> > > be > >> > > able to queue the bio's at right node in the rb-tree and then also to > >> > > be > >> > > able to take a decision when is the right time to release few > >> > > requests. > >> > > > >> > > Right now in crude implementation, I am working on making system boot. > >> > > Once patches are at least in little bit working shape, I will send it > >> > > to you > >> > > to have a look. > >> > > > >> > > Thanks > >> > > Vivek > >> > > >> > I wonder... wouldn't be simpler to just use the memory controller > >> > to retrieve this information starting from struct page? > >> > > >> > I mean, following this path (in short, obviously using the appropriate > >> > interfaces for locking and referencing the different objects): > >> > > >> > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > >> > > >> > Once you get the cgrp it's very easy to use the corresponding controller > >> > structure. > >> > > >> > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio > >> > to a cgroup. What other functionalities/advantages bio-cgroup provide in > >> > addition to that? > >> > >> I've decided to get Ryo to post the accurate dirty-page tracking patch > >> for bio-cgroup, which isn't perfect yet though. The memory controller > >> never wants to support this tracking because migrating a page between > >> memory cgroups is really heavy. > > It depends on the migration. The cost is proportional to the number of > pages moved. The cost can be brought down (I do have a design on > paper -- from long long ago), where moving mm's will reduce the cost > of migration, but it adds an additional dereference in the common > path. Okay, this will help to track anonymous pages even after processes are migrated between memory-cgroups. The rest of my concern is pages in the pagecache, which might be potentially dirtied by processes in other cgroups. I think bio-cgroups should also care this case. > >> I also thought enhancing the memory controller would be good enough, > >> but a lot of people said they wanted to control memory resource and > >> block I/O resource separately. > > > > Yes, ideally we do want that. > > > >> > >> So you can create several bio-cgroup in one memory-cgroup, > >> or you can use bio-cgroup without memory-cgroup. > >> > >> I also have a plan to implement more acurate tracking mechanism > >> on bio-cgroup after the memory cgroup team re-implement the > >> infrastructure, > >> which won't be supported by memory-cgroup. > >> When a process are moved into another memory cgroup, > >> the pages belonging to the process don't move to the new cgroup > >> because migrating pages is so heavy. It's hard to find the pages > >> from the process and migrating pages may cause some memory pressure. > >> I'll implement this feature only on bio-cgroup with minimum overhead > > > > Kamezawa has also wanted the page migration feature and we've agreed > to provide a per-cgroup flag to decide to turn migration on/off. I > would not mind refactoring memcontrol.c if that can help the IO > controller and if you want migration, force the migration flag to on > and warn the user if they try to turn it off. Good news! But I've been wondering whether the IO controller should have the same feature. Once Kamezawa-san finished to implement the new page_cgroup infrastructure which pre-allocates all the memory it needs, I think I can minimize the cost migrating pages between bio-cgroup since this migration won't cause any page reclaim unlike that of memory-cgroup. In this case I might design it only moves pages between bio-cgroups while it won't move them between memory-cgroups. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [Xen-devel] Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-26 10:54 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-26 10:54 UTC (permalink / raw) To: balbir Cc: righi.andrea, dm-devel, xen-devel, containers, agk, linux-kernel, virtualization, jens.axboe, ryov, fernando, vgoyal, xemul Hi, > >> > > Currently I have taken code from bio-cgroup to implement cgroups and > >> > > to > >> > > provide functionality to associate a bio to a cgroup. I need this to > >> > > be > >> > > able to queue the bio's at right node in the rb-tree and then also to > >> > > be > >> > > able to take a decision when is the right time to release few > >> > > requests. > >> > > > >> > > Right now in crude implementation, I am working on making system boot. > >> > > Once patches are at least in little bit working shape, I will send it > >> > > to you > >> > > to have a look. > >> > > > >> > > Thanks > >> > > Vivek > >> > > >> > I wonder... wouldn't be simpler to just use the memory controller > >> > to retrieve this information starting from struct page? > >> > > >> > I mean, following this path (in short, obviously using the appropriate > >> > interfaces for locking and referencing the different objects): > >> > > >> > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > >> > > >> > Once you get the cgrp it's very easy to use the corresponding controller > >> > structure. > >> > > >> > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio > >> > to a cgroup. What other functionalities/advantages bio-cgroup provide in > >> > addition to that? > >> > >> I've decided to get Ryo to post the accurate dirty-page tracking patch > >> for bio-cgroup, which isn't perfect yet though. The memory controller > >> never wants to support this tracking because migrating a page between > >> memory cgroups is really heavy. > > It depends on the migration. The cost is proportional to the number of > pages moved. The cost can be brought down (I do have a design on > paper -- from long long ago), where moving mm's will reduce the cost > of migration, but it adds an additional dereference in the common > path. Okay, this will help to track anonymous pages even after processes are migrated between memory-cgroups. The rest of my concern is pages in the pagecache, which might be potentially dirtied by processes in other cgroups. I think bio-cgroups should also care this case. > >> I also thought enhancing the memory controller would be good enough, > >> but a lot of people said they wanted to control memory resource and > >> block I/O resource separately. > > > > Yes, ideally we do want that. > > > >> > >> So you can create several bio-cgroup in one memory-cgroup, > >> or you can use bio-cgroup without memory-cgroup. > >> > >> I also have a plan to implement more acurate tracking mechanism > >> on bio-cgroup after the memory cgroup team re-implement the > >> infrastructure, > >> which won't be supported by memory-cgroup. > >> When a process are moved into another memory cgroup, > >> the pages belonging to the process don't move to the new cgroup > >> because migrating pages is so heavy. It's hard to find the pages > >> from the process and migrating pages may cause some memory pressure. > >> I'll implement this feature only on bio-cgroup with minimum overhead > > > > Kamezawa has also wanted the page migration feature and we've agreed > to provide a per-cgroup flag to decide to turn migration on/off. I > would not mind refactoring memcontrol.c if that can help the IO > controller and if you want migration, force the migration flag to on > and warn the user if they try to turn it off. Good news! But I've been wondering whether the IO controller should have the same feature. Once Kamezawa-san finished to implement the new page_cgroup infrastructure which pre-allocates all the memory it needs, I think I can minimize the cost migrating pages between bio-cgroup since this migration won't cause any page reclaim unlike that of memory-cgroup. In this case I might design it only moves pages between bio-cgroups while it won't move them between memory-cgroups. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <661de9470809240407m7f50b6dav897fef3b37295bb2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [Xen-devel] Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks [not found] ` <661de9470809240407m7f50b6dav897fef3b37295bb2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2008-09-26 10:54 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-26 10:54 UTC (permalink / raw) To: balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w Hi, > >> > > Currently I have taken code from bio-cgroup to implement cgroups and > >> > > to > >> > > provide functionality to associate a bio to a cgroup. I need this to > >> > > be > >> > > able to queue the bio's at right node in the rb-tree and then also to > >> > > be > >> > > able to take a decision when is the right time to release few > >> > > requests. > >> > > > >> > > Right now in crude implementation, I am working on making system boot. > >> > > Once patches are at least in little bit working shape, I will send it > >> > > to you > >> > > to have a look. > >> > > > >> > > Thanks > >> > > Vivek > >> > > >> > I wonder... wouldn't be simpler to just use the memory controller > >> > to retrieve this information starting from struct page? > >> > > >> > I mean, following this path (in short, obviously using the appropriate > >> > interfaces for locking and referencing the different objects): > >> > > >> > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > >> > > >> > Once you get the cgrp it's very easy to use the corresponding controller > >> > structure. > >> > > >> > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio > >> > to a cgroup. What other functionalities/advantages bio-cgroup provide in > >> > addition to that? > >> > >> I've decided to get Ryo to post the accurate dirty-page tracking patch > >> for bio-cgroup, which isn't perfect yet though. The memory controller > >> never wants to support this tracking because migrating a page between > >> memory cgroups is really heavy. > > It depends on the migration. The cost is proportional to the number of > pages moved. The cost can be brought down (I do have a design on > paper -- from long long ago), where moving mm's will reduce the cost > of migration, but it adds an additional dereference in the common > path. Okay, this will help to track anonymous pages even after processes are migrated between memory-cgroups. The rest of my concern is pages in the pagecache, which might be potentially dirtied by processes in other cgroups. I think bio-cgroups should also care this case. > >> I also thought enhancing the memory controller would be good enough, > >> but a lot of people said they wanted to control memory resource and > >> block I/O resource separately. > > > > Yes, ideally we do want that. > > > >> > >> So you can create several bio-cgroup in one memory-cgroup, > >> or you can use bio-cgroup without memory-cgroup. > >> > >> I also have a plan to implement more acurate tracking mechanism > >> on bio-cgroup after the memory cgroup team re-implement the > >> infrastructure, > >> which won't be supported by memory-cgroup. > >> When a process are moved into another memory cgroup, > >> the pages belonging to the process don't move to the new cgroup > >> because migrating pages is so heavy. It's hard to find the pages > >> from the process and migrating pages may cause some memory pressure. > >> I'll implement this feature only on bio-cgroup with minimum overhead > > > > Kamezawa has also wanted the page migration feature and we've agreed > to provide a per-cgroup flag to decide to turn migration on/off. I > would not mind refactoring memcontrol.c if that can help the IO > controller and if you want migration, force the migration flag to on > and warn the user if they try to turn it off. Good news! But I've been wondering whether the IO controller should have the same feature. Once Kamezawa-san finished to implement the new page_cgroup infrastructure which pre-allocates all the memory it needs, I think I can minimize the cost migrating pages between bio-cgroup since this migration won't cause any page reclaim unlike that of memory-cgroup. In this case I might design it only moves pages between bio-cgroups while it won't move them between memory-cgroups. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [Xen-devel] Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 11:04 ` [Xen-devel] " Balbir Singh [not found] ` <661de9470809240404i62300942o15337ecec335fe22-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2008-09-24 11:07 ` [Xen-devel] " Balbir Singh @ 2008-09-24 11:07 ` Balbir Singh 2 siblings, 0 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-24 11:07 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, vgoyal, righi.andrea On Wed, Sep 24, 2008 at 4:34 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > On Fri, Sep 19, 2008 at 9:04 AM, Hirokazu Takahashi <taka@valinux.co.jp> > wrote: >> >> Hi, >> >> > >> Vivek Goyal wrote: >> > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >> > >>>> Hi All, >> > >>>> >> > >>>> I have got excellent results of dm-ioband, that controls the disk >> > >>>> I/O >> > >>>> bandwidth even when it accepts delayed write requests. >> > >>>> >> > >>>> In this time, I ran some benchmarks with a high-end storage. The >> > >>>> reason was to avoid a performance bottleneck due to mechanical >> > >>>> factors >> > >>>> such as seek time. >> > >>>> >> > >>>> You can see the details of the benchmarks at: >> > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >> > >>>> >> > >>> Hi Ryo, >> > >>> >> > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will >> > >>> break >> > >>> the notion of process priority in CFQ because now dm-ioband device >> > >>> will >> > >>> hold the bio and issue these to lower layers later based on which >> > >>> bio's >> > >>> become ready. Hence actual bio submitting context might be different >> > >>> and >> > >>> because cfq derives the io_context from current task, it will be >> > >>> broken. >> > >>> >> > >>> To mitigate that problem, we probably need to implement Fernando's >> > >>> suggestion of putting io_context pointer in bio. >> > >>> >> > >>> Have you already done something to solve this issue? >> > >>> >> > >>> Secondly, why do we have to create an additional dm-ioband device >> > >>> for >> > >>> every device we want to control using rules. This looks little odd >> > >>> atleast to me. Can't we keep it in line with rest of the controllers >> > >>> where task grouping takes place using cgroup and rules are specified >> > >>> in >> > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? >> > >>> >> > >>> To avoid creation of stacking another device (dm-ioband) on top of >> > >>> every >> > >>> device we want to subject to rules, I was thinking of maintaining an >> > >>> rb-tree per request queue. Requests will first go into this rb-tree >> > >>> upon >> > >>> __make_request() and then will filter down to elevator associated >> > >>> with the >> > >>> queue (if there is one). This will provide us the control of >> > >>> releasing >> > >>> bio's to elevaor based on policies (proportional weight, max >> > >>> bandwidth >> > >>> etc) and no need of stacking additional block device. >> > >>> >> > >>> I am working on some experimental proof of concept patches. It will >> > >>> take >> > >>> some time though. >> > >>> >> > >>> I was thinking of following. >> > >>> >> > >>> - Adopt the Andrea Righi's style of specifying rules for devices and >> > >>> group the tasks using cgroups. >> > >>> >> > >>> - To begin with, adopt dm-ioband's approach of proportional >> > >>> bandwidth >> > >>> controller. It makes sense to me limit the bandwidth usage only in >> > >>> case of contention. If there is really a need to limit max >> > >>> bandwidth, >> > >>> then probably we can do something to implement additional rules or >> > >>> implement some policy switcher where user can decide what kind of >> > >>> policies need to be implemented. >> > >>> >> > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on >> > >>> every >> > >>> request queue which is controlled by some kind of cgroup rules. >> > >>> >> > >>> It would be good to discuss above approach now whether it makes >> > >>> sense or >> > >>> not. I think it is kind of fusion of io-throttling and dm-ioband >> > >>> patches >> > >>> with additional idea of doing io-control just above elevator on the >> > >>> request >> > >>> queue using an rb-tree. >> > >> Thanks Vivek. All sounds reasonable to me and I think this is be the >> > >> right way >> > >> to proceed. >> > >> >> > >> I'll try to design and implement your rb-tree per request-queue idea >> > >> into my >> > >> io-throttle controller, maybe we can reuse it also for a more generic >> > >> solution. >> > >> Feel free to send me your experimental proof of concept if you want, >> > >> even if >> > >> it's not yet complete, I can review it, test and contribute. >> > > >> > > Currently I have taken code from bio-cgroup to implement cgroups and >> > > to >> > > provide functionality to associate a bio to a cgroup. I need this to >> > > be >> > > able to queue the bio's at right node in the rb-tree and then also to >> > > be >> > > able to take a decision when is the right time to release few >> > > requests. >> > > >> > > Right now in crude implementation, I am working on making system boot. >> > > Once patches are at least in little bit working shape, I will send it >> > > to you >> > > to have a look. >> > > >> > > Thanks >> > > Vivek >> > >> > I wonder... wouldn't be simpler to just use the memory controller >> > to retrieve this information starting from struct page? >> > >> > I mean, following this path (in short, obviously using the appropriate >> > interfaces for locking and referencing the different objects): >> > >> > cgrp = page->page_cgroup->mem_cgroup->css.cgroup >> > >> > Once you get the cgrp it's very easy to use the corresponding controller >> > structure. >> > >> > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio >> > to a cgroup. What other functionalities/advantages bio-cgroup provide in >> > addition to that? >> >> I've decided to get Ryo to post the accurate dirty-page tracking patch >> for bio-cgroup, which isn't perfect yet though. The memory controller >> never wants to support this tracking because migrating a page between >> memory cgroups is really heavy. It depends on the migration. The cost is proportional to the number of pages moved. The cost can be brought down (I do have a design on paper -- from long long ago), where moving mm's will reduce the cost of migration, but it adds an additional dereference in the common path. > >> >> I also thought enhancing the memory controller would be good enough, >> but a lot of people said they wanted to control memory resource and >> block I/O resource separately. > > Yes, ideally we do want that. > >> >> So you can create several bio-cgroup in one memory-cgroup, >> or you can use bio-cgroup without memory-cgroup. >> >> I also have a plan to implement more acurate tracking mechanism >> on bio-cgroup after the memory cgroup team re-implement the >> infrastructure, >> which won't be supported by memory-cgroup. >> When a process are moved into another memory cgroup, >> the pages belonging to the process don't move to the new cgroup >> because migrating pages is so heavy. It's hard to find the pages >> from the process and migrating pages may cause some memory pressure. >> I'll implement this feature only on bio-cgroup with minimum overhead > Kamezawa has also wanted the page migration feature and we've agreed to provide a per-cgroup flag to decide to turn migration on/off. I would not mind refactoring memcontrol.c if that can help the IO controller and if you want migration, force the migration flag to on and warn the user if they try to turn it off. Balbir ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080919.123405.91829935.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>]
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080919.123405.91829935.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> @ 2008-09-20 4:27 ` KAMEZAWA Hiroyuki 2008-09-24 11:04 ` [Xen-devel] " Balbir Singh 1 sibling, 0 replies; 140+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-09-20 4:27 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, xemul-GEFAQzZX7r8dnm+yROfE0A, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, fernando-gVGce1chcLdL9jVzuh4AOg, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w On Fri, 19 Sep 2008 12:34:05 +0900 (JST) Hirokazu Takahashi <taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> wrote: > I've decided to get Ryo to post the accurate dirty-page tracking patch > for bio-cgroup, which isn't perfect yet though. The memory controller > never wants to support this tracking because migrating a page between > memory cgroups is really heavy. > > I also thought enhancing the memory controller would be good enough, > but a lot of people said they wanted to control memory resource and > block I/O resource separately. > So you can create several bio-cgroup in one memory-cgroup, > or you can use bio-cgroup without memory-cgroup. > > I also have a plan to implement more acurate tracking mechanism > on bio-cgroup after the memory cgroup team re-implement the infrastructure, > which won't be supported by memory-cgroup. > When a process are moved into another memory cgroup, > the pages belonging to the process don't move to the new cgroup > because migrating pages is so heavy. It's hard to find the pages > from the process and migrating pages may cause some memory pressure. > I'll implement this feature only on bio-cgroup with minimum overhead > I really would like to move page_cgroup to new cgroup when the process moves... But it's just in my plan and I'm not sure I can do it or not. Anyway what's next for me is 1. fix current discussion to remove page->page_cgroup pointer. 2. reduce locks. 3. support swap and swap-cache. I think algorithm for (1), (2) is now getting smart. Thanks, -Kame ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [Xen-devel] Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080919.123405.91829935.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> 2008-09-20 4:27 ` KAMEZAWA Hiroyuki @ 2008-09-24 11:04 ` Balbir Singh 1 sibling, 0 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-24 11:04 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w On Fri, Sep 19, 2008 at 9:04 AM, Hirokazu Takahashi <taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>wrote: > Hi, > > > >> Vivek Goyal wrote: > > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > > >>>> Hi All, > > >>>> > > >>>> I have got excellent results of dm-ioband, that controls the disk > I/O > > >>>> bandwidth even when it accepts delayed write requests. > > >>>> > > >>>> In this time, I ran some benchmarks with a high-end storage. The > > >>>> reason was to avoid a performance bottleneck due to mechanical > factors > > >>>> such as seek time. > > >>>> > > >>>> You can see the details of the benchmarks at: > > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/<http://people.valinux.co.jp/%7Eryov/dm-ioband/hps/> > > >>>> > > >>> Hi Ryo, > > >>> > > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will > break > > >>> the notion of process priority in CFQ because now dm-ioband device > will > > >>> hold the bio and issue these to lower layers later based on which > bio's > > >>> become ready. Hence actual bio submitting context might be different > and > > >>> because cfq derives the io_context from current task, it will be > broken. > > >>> > > >>> To mitigate that problem, we probably need to implement Fernando's > > >>> suggestion of putting io_context pointer in bio. > > >>> > > >>> Have you already done something to solve this issue? > > >>> > > >>> Secondly, why do we have to create an additional dm-ioband device for > > >>> every device we want to control using rules. This looks little odd > > >>> atleast to me. Can't we keep it in line with rest of the controllers > > >>> where task grouping takes place using cgroup and rules are specified > in > > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? > > >>> > > >>> To avoid creation of stacking another device (dm-ioband) on top of > every > > >>> device we want to subject to rules, I was thinking of maintaining an > > >>> rb-tree per request queue. Requests will first go into this rb-tree > upon > > >>> __make_request() and then will filter down to elevator associated > with the > > >>> queue (if there is one). This will provide us the control of > releasing > > >>> bio's to elevaor based on policies (proportional weight, max > bandwidth > > >>> etc) and no need of stacking additional block device. > > >>> > > >>> I am working on some experimental proof of concept patches. It will > take > > >>> some time though. > > >>> > > >>> I was thinking of following. > > >>> > > >>> - Adopt the Andrea Righi's style of specifying rules for devices and > > >>> group the tasks using cgroups. > > >>> > > >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth > > >>> controller. It makes sense to me limit the bandwidth usage only in > > >>> case of contention. If there is really a need to limit max > bandwidth, > > >>> then probably we can do something to implement additional rules or > > >>> implement some policy switcher where user can decide what kind of > > >>> policies need to be implemented. > > >>> > > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on > every > > >>> request queue which is controlled by some kind of cgroup rules. > > >>> > > >>> It would be good to discuss above approach now whether it makes sense > or > > >>> not. I think it is kind of fusion of io-throttling and dm-ioband > patches > > >>> with additional idea of doing io-control just above elevator on the > request > > >>> queue using an rb-tree. > > >> Thanks Vivek. All sounds reasonable to me and I think this is be the > right way > > >> to proceed. > > >> > > >> I'll try to design and implement your rb-tree per request-queue idea > into my > > >> io-throttle controller, maybe we can reuse it also for a more generic > solution. > > >> Feel free to send me your experimental proof of concept if you want, > even if > > >> it's not yet complete, I can review it, test and contribute. > > > > > > Currently I have taken code from bio-cgroup to implement cgroups and to > > > provide functionality to associate a bio to a cgroup. I need this to be > > > able to queue the bio's at right node in the rb-tree and then also to > be > > > able to take a decision when is the right time to release few requests. > > > > > > Right now in crude implementation, I am working on making system boot. > > > Once patches are at least in little bit working shape, I will send it > to you > > > to have a look. > > > > > > Thanks > > > Vivek > > > > I wonder... wouldn't be simpler to just use the memory controller > > to retrieve this information starting from struct page? > > > > I mean, following this path (in short, obviously using the appropriate > > interfaces for locking and referencing the different objects): > > > > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > > > > Once you get the cgrp it's very easy to use the corresponding controller > > structure. > > > > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio > > to a cgroup. What other functionalities/advantages bio-cgroup provide in > > addition to that? > > I've decided to get Ryo to post the accurate dirty-page tracking patch > for bio-cgroup, which isn't perfect yet though. The memory controller > never wants to support this tracking because migrating a page between > memory cgroups is really heavy. > It depends on the migration. The cost is proportional to the number of pages moved. The cost can be brought down (I do have a design on paper -- from long long ago), where moving mm's will reduce the cost of migration, but it adds an additional dereference in the common path. > > I also thought enhancing the memory controller would be good enough, > but a lot of people said they wanted to control memory resource and > block I/O resource separately. Yes, ideally we do want that. > > So you can create several bio-cgroup in one memory-cgroup, > or you can use bio-cgroup without memory-cgroup. > > I also have a plan to implement more acurate tracking mechanism > on bio-cgroup after the memory cgroup team re-implement the infrastructure, > which won't be supported by memory-cgroup. > When a process are moved into another memory cgroup, > the pages belonging to the process don't move to the new cgroup > because migrating pages is so heavy. It's hard to find the pages > from the process and migrating pages may cause some memory pressure. > I'll implement this feature only on bio-cgroup with minimum overhead > Kamezawa has also wanted the page migration feature and we've agreed to provide a per-cgroup flag to decide to turn migration on/off. I would not mind refactoring memcontrol.c if that can help the IO controller and if you want migration, force the migration flag to on and warn the user if they try to turn it off. Balbir ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 15:18 ` Andrea Righi ` (3 preceding siblings ...) (?) @ 2008-09-19 3:34 ` Hirokazu Takahashi -1 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-19 3:34 UTC (permalink / raw) To: righi.andrea, dm-devel Cc: xen-devel, containers, agk, linux-kernel, virtualization, jens.axboe, balbir, fernando, vgoyal, xemul Hi, > >> Vivek Goyal wrote: > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > >>>> Hi All, > >>>> > >>>> I have got excellent results of dm-ioband, that controls the disk I/O > >>>> bandwidth even when it accepts delayed write requests. > >>>> > >>>> In this time, I ran some benchmarks with a high-end storage. The > >>>> reason was to avoid a performance bottleneck due to mechanical factors > >>>> such as seek time. > >>>> > >>>> You can see the details of the benchmarks at: > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > >>>> > >>> Hi Ryo, > >>> > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > >>> the notion of process priority in CFQ because now dm-ioband device will > >>> hold the bio and issue these to lower layers later based on which bio's > >>> become ready. Hence actual bio submitting context might be different and > >>> because cfq derives the io_context from current task, it will be broken. > >>> > >>> To mitigate that problem, we probably need to implement Fernando's > >>> suggestion of putting io_context pointer in bio. > >>> > >>> Have you already done something to solve this issue? > >>> > >>> Secondly, why do we have to create an additional dm-ioband device for > >>> every device we want to control using rules. This looks little odd > >>> atleast to me. Can't we keep it in line with rest of the controllers > >>> where task grouping takes place using cgroup and rules are specified in > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? > >>> > >>> To avoid creation of stacking another device (dm-ioband) on top of every > >>> device we want to subject to rules, I was thinking of maintaining an > >>> rb-tree per request queue. Requests will first go into this rb-tree upon > >>> __make_request() and then will filter down to elevator associated with the > >>> queue (if there is one). This will provide us the control of releasing > >>> bio's to elevaor based on policies (proportional weight, max bandwidth > >>> etc) and no need of stacking additional block device. > >>> > >>> I am working on some experimental proof of concept patches. It will take > >>> some time though. > >>> > >>> I was thinking of following. > >>> > >>> - Adopt the Andrea Righi's style of specifying rules for devices and > >>> group the tasks using cgroups. > >>> > >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth > >>> controller. It makes sense to me limit the bandwidth usage only in > >>> case of contention. If there is really a need to limit max bandwidth, > >>> then probably we can do something to implement additional rules or > >>> implement some policy switcher where user can decide what kind of > >>> policies need to be implemented. > >>> > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > >>> request queue which is controlled by some kind of cgroup rules. > >>> > >>> It would be good to discuss above approach now whether it makes sense or > >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches > >>> with additional idea of doing io-control just above elevator on the request > >>> queue using an rb-tree. > >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way > >> to proceed. > >> > >> I'll try to design and implement your rb-tree per request-queue idea into my > >> io-throttle controller, maybe we can reuse it also for a more generic solution. > >> Feel free to send me your experimental proof of concept if you want, even if > >> it's not yet complete, I can review it, test and contribute. > > > > Currently I have taken code from bio-cgroup to implement cgroups and to > > provide functionality to associate a bio to a cgroup. I need this to be > > able to queue the bio's at right node in the rb-tree and then also to be > > able to take a decision when is the right time to release few requests. > > > > Right now in crude implementation, I am working on making system boot. > > Once patches are at least in little bit working shape, I will send it to you > > to have a look. > > > > Thanks > > Vivek > > I wonder... wouldn't be simpler to just use the memory controller > to retrieve this information starting from struct page? > > I mean, following this path (in short, obviously using the appropriate > interfaces for locking and referencing the different objects): > > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > > Once you get the cgrp it's very easy to use the corresponding controller > structure. > > Actually, this is how I'm doing in cgroup-io-throttle to associate a bio > to a cgroup. What other functionalities/advantages bio-cgroup provide in > addition to that? I've decided to get Ryo to post the accurate dirty-page tracking patch for bio-cgroup, which isn't perfect yet though. The memory controller never wants to support this tracking because migrating a page between memory cgroups is really heavy. I also thought enhancing the memory controller would be good enough, but a lot of people said they wanted to control memory resource and block I/O resource separately. So you can create several bio-cgroup in one memory-cgroup, or you can use bio-cgroup without memory-cgroup. I also have a plan to implement more acurate tracking mechanism on bio-cgroup after the memory cgroup team re-implement the infrastructure, which won't be supported by memory-cgroup. When a process are moved into another memory cgroup, the pages belonging to the process don't move to the new cgroup because migrating pages is so heavy. It's hard to find the pages from the process and migrating pages may cause some memory pressure. I'll implement this feature only on bio-cgroup with minimum overhead Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <48D267B5.20402-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <48D267B5.20402-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2008-09-18 15:06 ` Vivek Goyal 0 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-18 15:06 UTC (permalink / raw) To: Andrea Righi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: > Vivek Goyal wrote: > > On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > >> Hi All, > >> > >> I have got excellent results of dm-ioband, that controls the disk I/O > >> bandwidth even when it accepts delayed write requests. > >> > >> In this time, I ran some benchmarks with a high-end storage. The > >> reason was to avoid a performance bottleneck due to mechanical factors > >> such as seek time. > >> > >> You can see the details of the benchmarks at: > >> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > >> > > > > Hi Ryo, > > > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > > the notion of process priority in CFQ because now dm-ioband device will > > hold the bio and issue these to lower layers later based on which bio's > > become ready. Hence actual bio submitting context might be different and > > because cfq derives the io_context from current task, it will be broken. > > > > To mitigate that problem, we probably need to implement Fernando's > > suggestion of putting io_context pointer in bio. > > > > Have you already done something to solve this issue? > > > > Secondly, why do we have to create an additional dm-ioband device for > > every device we want to control using rules. This looks little odd > > atleast to me. Can't we keep it in line with rest of the controllers > > where task grouping takes place using cgroup and rules are specified in > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > device we want to subject to rules, I was thinking of maintaining an > > rb-tree per request queue. Requests will first go into this rb-tree upon > > __make_request() and then will filter down to elevator associated with the > > queue (if there is one). This will provide us the control of releasing > > bio's to elevaor based on policies (proportional weight, max bandwidth > > etc) and no need of stacking additional block device. > > > > I am working on some experimental proof of concept patches. It will take > > some time though. > > > > I was thinking of following. > > > > - Adopt the Andrea Righi's style of specifying rules for devices and > > group the tasks using cgroups. > > > > - To begin with, adopt dm-ioband's approach of proportional bandwidth > > controller. It makes sense to me limit the bandwidth usage only in > > case of contention. If there is really a need to limit max bandwidth, > > then probably we can do something to implement additional rules or > > implement some policy switcher where user can decide what kind of > > policies need to be implemented. > > > > - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > > request queue which is controlled by some kind of cgroup rules. > > > > It would be good to discuss above approach now whether it makes sense or > > not. I think it is kind of fusion of io-throttling and dm-ioband patches > > with additional idea of doing io-control just above elevator on the request > > queue using an rb-tree. > > Thanks Vivek. All sounds reasonable to me and I think this is be the right way > to proceed. > > I'll try to design and implement your rb-tree per request-queue idea into my > io-throttle controller, maybe we can reuse it also for a more generic solution. > Feel free to send me your experimental proof of concept if you want, even if > it's not yet complete, I can review it, test and contribute. Currently I have taken code from bio-cgroup to implement cgroups and to provide functionality to associate a bio to a cgroup. I need this to be able to queue the bio's at right node in the rb-tree and then also to be able to take a decision when is the right time to release few requests. Right now in crude implementation, I am working on making system boot. Once patches are at least in little bit working shape, I will send it to you to have a look. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 13:15 ` Vivek Goyal 2008-09-18 14:37 ` Andrea Righi @ 2008-09-18 14:37 ` Andrea Righi 2008-09-19 6:12 ` Hirokazu Takahashi ` (4 subsequent siblings) 6 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-18 14:37 UTC (permalink / raw) To: Vivek Goyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, balbir Vivek Goyal wrote: > On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >> Hi All, >> >> I have got excellent results of dm-ioband, that controls the disk I/O >> bandwidth even when it accepts delayed write requests. >> >> In this time, I ran some benchmarks with a high-end storage. The >> reason was to avoid a performance bottleneck due to mechanical factors >> such as seek time. >> >> You can see the details of the benchmarks at: >> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >> > > Hi Ryo, > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > the notion of process priority in CFQ because now dm-ioband device will > hold the bio and issue these to lower layers later based on which bio's > become ready. Hence actual bio submitting context might be different and > because cfq derives the io_context from current task, it will be broken. > > To mitigate that problem, we probably need to implement Fernando's > suggestion of putting io_context pointer in bio. > > Have you already done something to solve this issue? > > Secondly, why do we have to create an additional dm-ioband device for > every device we want to control using rules. This looks little odd > atleast to me. Can't we keep it in line with rest of the controllers > where task grouping takes place using cgroup and rules are specified in > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > To avoid creation of stacking another device (dm-ioband) on top of every > device we want to subject to rules, I was thinking of maintaining an > rb-tree per request queue. Requests will first go into this rb-tree upon > __make_request() and then will filter down to elevator associated with the > queue (if there is one). This will provide us the control of releasing > bio's to elevaor based on policies (proportional weight, max bandwidth > etc) and no need of stacking additional block device. > > I am working on some experimental proof of concept patches. It will take > some time though. > > I was thinking of following. > > - Adopt the Andrea Righi's style of specifying rules for devices and > group the tasks using cgroups. > > - To begin with, adopt dm-ioband's approach of proportional bandwidth > controller. It makes sense to me limit the bandwidth usage only in > case of contention. If there is really a need to limit max bandwidth, > then probably we can do something to implement additional rules or > implement some policy switcher where user can decide what kind of > policies need to be implemented. > > - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > request queue which is controlled by some kind of cgroup rules. > > It would be good to discuss above approach now whether it makes sense or > not. I think it is kind of fusion of io-throttling and dm-ioband patches > with additional idea of doing io-control just above elevator on the request > queue using an rb-tree. Thanks Vivek. All sounds reasonable to me and I think this is be the right way to proceed. I'll try to design and implement your rb-tree per request-queue idea into my io-throttle controller, maybe we can reuse it also for a more generic solution. Feel free to send me your experimental proof of concept if you want, even if it's not yet complete, I can review it, test and contribute. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 13:15 ` Vivek Goyal @ 2008-09-19 6:12 ` Hirokazu Takahashi 2008-09-18 14:37 ` Andrea Righi ` (5 subsequent siblings) 6 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-19 6:12 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, ryov, xemul, fernando, balbir Hi, > > Hi All, > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > bandwidth even when it accepts delayed write requests. > > > > In this time, I ran some benchmarks with a high-end storage. The > > reason was to avoid a performance bottleneck due to mechanical factors > > such as seek time. > > > > You can see the details of the benchmarks at: > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > Hi Ryo, > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > the notion of process priority in CFQ because now dm-ioband device will > hold the bio and issue these to lower layers later based on which bio's > become ready. Hence actual bio submitting context might be different and > because cfq derives the io_context from current task, it will be broken. This is completely another problem we have to solve. The CFQ scheduler has really bad assumption that the current process must be the owner. This problem occurs when you use some of device mapper devices or use linux aio. > To mitigate that problem, we probably need to implement Fernando's > suggestion of putting io_context pointer in bio. > > Have you already done something to solve this issue? Actually, I already have a patch to solve this problem, which make each bio have a pointer to the io_context of the owner process. Would you take a look at the thread whose subject is "I/O context inheritance" in: http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2850 Fernando also knows this. Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-19 6:12 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-19 6:12 UTC (permalink / raw) To: vgoyal Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe Hi, > > Hi All, > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > bandwidth even when it accepts delayed write requests. > > > > In this time, I ran some benchmarks with a high-end storage. The > > reason was to avoid a performance bottleneck due to mechanical factors > > such as seek time. > > > > You can see the details of the benchmarks at: > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > Hi Ryo, > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > the notion of process priority in CFQ because now dm-ioband device will > hold the bio and issue these to lower layers later based on which bio's > become ready. Hence actual bio submitting context might be different and > because cfq derives the io_context from current task, it will be broken. This is completely another problem we have to solve. The CFQ scheduler has really bad assumption that the current process must be the owner. This problem occurs when you use some of device mapper devices or use linux aio. > To mitigate that problem, we probably need to implement Fernando's > suggestion of putting io_context pointer in bio. > > Have you already done something to solve this issue? Actually, I already have a patch to solve this problem, which make each bio have a pointer to the io_context of the owner process. Would you take a look at the thread whose subject is "I/O context inheritance" in: http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2850 Fernando also knows this. Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 6:12 ` Hirokazu Takahashi (?) @ 2008-09-19 13:12 ` Vivek Goyal -1 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-19 13:12 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir On Fri, Sep 19, 2008 at 03:12:21PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > Hi All, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > bandwidth even when it accepts delayed write requests. > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > reason was to avoid a performance bottleneck due to mechanical factors > > > such as seek time. > > > > > > You can see the details of the benchmarks at: > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > > Hi Ryo, > > > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > > the notion of process priority in CFQ because now dm-ioband device will > > hold the bio and issue these to lower layers later based on which bio's > > become ready. Hence actual bio submitting context might be different and > > because cfq derives the io_context from current task, it will be broken. > > This is completely another problem we have to solve. > The CFQ scheduler has really bad assumption that the current process > must be the owner. This problem occurs when you use some of device > mapper devices or use linux aio. > > > To mitigate that problem, we probably need to implement Fernando's > > suggestion of putting io_context pointer in bio. > > > > Have you already done something to solve this issue? > > Actually, I already have a patch to solve this problem, which make > each bio have a pointer to the io_context of the owner process. > Would you take a look at the thread whose subject is "I/O context > inheritance" in: > http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2850 > > Fernando also knows this. Great. Sure I will have a look at this thread. This is something we shall have to implement, irrespective of the fact whether we go for dm-ioband approach or an rb-tree per request queue approach. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080919.151221.49666828.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080919.151221.49666828.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> @ 2008-09-19 13:12 ` Vivek Goyal 0 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-19 13:12 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 On Fri, Sep 19, 2008 at 03:12:21PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > Hi All, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > bandwidth even when it accepts delayed write requests. > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > reason was to avoid a performance bottleneck due to mechanical factors > > > such as seek time. > > > > > > You can see the details of the benchmarks at: > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > > Hi Ryo, > > > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > > the notion of process priority in CFQ because now dm-ioband device will > > hold the bio and issue these to lower layers later based on which bio's > > become ready. Hence actual bio submitting context might be different and > > because cfq derives the io_context from current task, it will be broken. > > This is completely another problem we have to solve. > The CFQ scheduler has really bad assumption that the current process > must be the owner. This problem occurs when you use some of device > mapper devices or use linux aio. > > > To mitigate that problem, we probably need to implement Fernando's > > suggestion of putting io_context pointer in bio. > > > > Have you already done something to solve this issue? > > Actually, I already have a patch to solve this problem, which make > each bio have a pointer to the io_context of the owner process. > Would you take a look at the thread whose subject is "I/O context > inheritance" in: > http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2850 > > Fernando also knows this. Great. Sure I will have a look at this thread. This is something we shall have to implement, irrespective of the fact whether we go for dm-ioband approach or an rb-tree per request queue approach. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 6:12 ` Hirokazu Takahashi ` (2 preceding siblings ...) (?) @ 2008-09-19 13:12 ` Vivek Goyal -1 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-19 13:12 UTC (permalink / raw) To: Hirokazu Takahashi Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe On Fri, Sep 19, 2008 at 03:12:21PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > Hi All, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > bandwidth even when it accepts delayed write requests. > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > reason was to avoid a performance bottleneck due to mechanical factors > > > such as seek time. > > > > > > You can see the details of the benchmarks at: > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > > Hi Ryo, > > > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > > the notion of process priority in CFQ because now dm-ioband device will > > hold the bio and issue these to lower layers later based on which bio's > > become ready. Hence actual bio submitting context might be different and > > because cfq derives the io_context from current task, it will be broken. > > This is completely another problem we have to solve. > The CFQ scheduler has really bad assumption that the current process > must be the owner. This problem occurs when you use some of device > mapper devices or use linux aio. > > > To mitigate that problem, we probably need to implement Fernando's > > suggestion of putting io_context pointer in bio. > > > > Have you already done something to solve this issue? > > Actually, I already have a patch to solve this problem, which make > each bio have a pointer to the io_context of the owner process. > Would you take a look at the thread whose subject is "I/O context > inheritance" in: > http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2850 > > Fernando also knows this. Great. Sure I will have a look at this thread. This is something we shall have to implement, irrespective of the fact whether we go for dm-ioband approach or an rb-tree per request queue approach. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 13:15 ` Vivek Goyal ` (2 preceding siblings ...) 2008-09-19 6:12 ` Hirokazu Takahashi @ 2008-09-19 6:12 ` Hirokazu Takahashi [not found] ` <20080918131554.GB20640-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> ` (2 subsequent siblings) 6 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-19 6:12 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir Hi, > > Hi All, > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > bandwidth even when it accepts delayed write requests. > > > > In this time, I ran some benchmarks with a high-end storage. The > > reason was to avoid a performance bottleneck due to mechanical factors > > such as seek time. > > > > You can see the details of the benchmarks at: > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > Hi Ryo, > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > the notion of process priority in CFQ because now dm-ioband device will > hold the bio and issue these to lower layers later based on which bio's > become ready. Hence actual bio submitting context might be different and > because cfq derives the io_context from current task, it will be broken. This is completely another problem we have to solve. The CFQ scheduler has really bad assumption that the current process must be the owner. This problem occurs when you use some of device mapper devices or use linux aio. > To mitigate that problem, we probably need to implement Fernando's > suggestion of putting io_context pointer in bio. > > Have you already done something to solve this issue? Actually, I already have a patch to solve this problem, which make each bio have a pointer to the io_context of the owner process. Would you take a look at the thread whose subject is "I/O context inheritance" in: http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2850 Fernando also knows this. Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080918131554.GB20640-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080918131554.GB20640-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-09-18 14:37 ` Andrea Righi 2008-09-19 6:12 ` Hirokazu Takahashi 2008-09-19 11:20 ` Hirokazu Takahashi 2 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-18 14:37 UTC (permalink / raw) To: Vivek Goyal Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Vivek Goyal wrote: > On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: >> Hi All, >> >> I have got excellent results of dm-ioband, that controls the disk I/O >> bandwidth even when it accepts delayed write requests. >> >> In this time, I ran some benchmarks with a high-end storage. The >> reason was to avoid a performance bottleneck due to mechanical factors >> such as seek time. >> >> You can see the details of the benchmarks at: >> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ >> > > Hi Ryo, > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > the notion of process priority in CFQ because now dm-ioband device will > hold the bio and issue these to lower layers later based on which bio's > become ready. Hence actual bio submitting context might be different and > because cfq derives the io_context from current task, it will be broken. > > To mitigate that problem, we probably need to implement Fernando's > suggestion of putting io_context pointer in bio. > > Have you already done something to solve this issue? > > Secondly, why do we have to create an additional dm-ioband device for > every device we want to control using rules. This looks little odd > atleast to me. Can't we keep it in line with rest of the controllers > where task grouping takes place using cgroup and rules are specified in > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > To avoid creation of stacking another device (dm-ioband) on top of every > device we want to subject to rules, I was thinking of maintaining an > rb-tree per request queue. Requests will first go into this rb-tree upon > __make_request() and then will filter down to elevator associated with the > queue (if there is one). This will provide us the control of releasing > bio's to elevaor based on policies (proportional weight, max bandwidth > etc) and no need of stacking additional block device. > > I am working on some experimental proof of concept patches. It will take > some time though. > > I was thinking of following. > > - Adopt the Andrea Righi's style of specifying rules for devices and > group the tasks using cgroups. > > - To begin with, adopt dm-ioband's approach of proportional bandwidth > controller. It makes sense to me limit the bandwidth usage only in > case of contention. If there is really a need to limit max bandwidth, > then probably we can do something to implement additional rules or > implement some policy switcher where user can decide what kind of > policies need to be implemented. > > - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > request queue which is controlled by some kind of cgroup rules. > > It would be good to discuss above approach now whether it makes sense or > not. I think it is kind of fusion of io-throttling and dm-ioband patches > with additional idea of doing io-control just above elevator on the request > queue using an rb-tree. Thanks Vivek. All sounds reasonable to me and I think this is be the right way to proceed. I'll try to design and implement your rb-tree per request-queue idea into my io-throttle controller, maybe we can reuse it also for a more generic solution. Feel free to send me your experimental proof of concept if you want, even if it's not yet complete, I can review it, test and contribute. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080918131554.GB20640-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2008-09-18 14:37 ` Andrea Righi @ 2008-09-19 6:12 ` Hirokazu Takahashi 2008-09-19 11:20 ` Hirokazu Takahashi 2 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-19 6:12 UTC (permalink / raw) To: vgoyal-H+wXaHxf7aLQT0dZR+AlfA Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi, > > Hi All, > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > bandwidth even when it accepts delayed write requests. > > > > In this time, I ran some benchmarks with a high-end storage. The > > reason was to avoid a performance bottleneck due to mechanical factors > > such as seek time. > > > > You can see the details of the benchmarks at: > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > Hi Ryo, > > I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > the notion of process priority in CFQ because now dm-ioband device will > hold the bio and issue these to lower layers later based on which bio's > become ready. Hence actual bio submitting context might be different and > because cfq derives the io_context from current task, it will be broken. This is completely another problem we have to solve. The CFQ scheduler has really bad assumption that the current process must be the owner. This problem occurs when you use some of device mapper devices or use linux aio. > To mitigate that problem, we probably need to implement Fernando's > suggestion of putting io_context pointer in bio. > > Have you already done something to solve this issue? Actually, I already have a patch to solve this problem, which make each bio have a pointer to the io_context of the owner process. Would you take a look at the thread whose subject is "I/O context inheritance" in: http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2850 Fernando also knows this. Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080918131554.GB20640-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2008-09-18 14:37 ` Andrea Righi 2008-09-19 6:12 ` Hirokazu Takahashi @ 2008-09-19 11:20 ` Hirokazu Takahashi 2 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-19 11:20 UTC (permalink / raw) To: vgoyal-H+wXaHxf7aLQT0dZR+AlfA Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi, > > Hi All, > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > bandwidth even when it accepts delayed write requests. > > > > In this time, I ran some benchmarks with a high-end storage. The > > reason was to avoid a performance bottleneck due to mechanical factors > > such as seek time. > > > > You can see the details of the benchmarks at: > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ (snip) > Secondly, why do we have to create an additional dm-ioband device for > every device we want to control using rules. This looks little odd > atleast to me. Can't we keep it in line with rest of the controllers > where task grouping takes place using cgroup and rules are specified in > cgroup itself (The way Andrea Righi does for io-throttling patches)? It isn't essential dm-band is implemented as one of the device-mappers. I've been also considering that this algorithm itself can be implemented in the block layer directly. Although, the current implementation has merits. It is flexible. - Dm-ioband can be place anywhere you like, which may be right before the I/O schedulers or may be placed on top of LVM devices. - It supports partition based bandwidth control which can work without cgroups, which is quite easy to use of. - It is independent to any I/O schedulers including ones which will be introduced in the future. I also understand it's will be hard to set up without some tools such as lvm commands. > To avoid creation of stacking another device (dm-ioband) on top of every > device we want to subject to rules, I was thinking of maintaining an > rb-tree per request queue. Requests will first go into this rb-tree upon > __make_request() and then will filter down to elevator associated with the > queue (if there is one). This will provide us the control of releasing > bio's to elevaor based on policies (proportional weight, max bandwidth > etc) and no need of stacking additional block device. I think it's a bit late to control I/O requests there, since process may be blocked in get_request_wait when the I/O load is high. Please imagine the situation that cgroups with low bandwidths are consuming most of "struct request"s while another cgroup with a high bandwidth is blocked and can't get enough "struct request"s. It means cgroups that issues lot of I/O request can win the game. > I am working on some experimental proof of concept patches. It will take > some time though. > > I was thinking of following. > > - Adopt the Andrea Righi's style of specifying rules for devices and > group the tasks using cgroups. > > - To begin with, adopt dm-ioband's approach of proportional bandwidth > controller. It makes sense to me limit the bandwidth usage only in > case of contention. If there is really a need to limit max bandwidth, > then probably we can do something to implement additional rules or > implement some policy switcher where user can decide what kind of > policies need to be implemented. > > - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > request queue which is controlled by some kind of cgroup rules. > > It would be good to discuss above approach now whether it makes sense or > not. I think it is kind of fusion of io-throttling and dm-ioband patches > with additional idea of doing io-control just above elevator on the request > queue using an rb-tree. > > Thanks > Vivek > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 13:15 ` Vivek Goyal @ 2008-09-19 11:20 ` Hirokazu Takahashi 2008-09-18 14:37 ` Andrea Righi ` (5 subsequent siblings) 6 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-19 11:20 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, ryov, xemul, fernando, balbir Hi, > > Hi All, > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > bandwidth even when it accepts delayed write requests. > > > > In this time, I ran some benchmarks with a high-end storage. The > > reason was to avoid a performance bottleneck due to mechanical factors > > such as seek time. > > > > You can see the details of the benchmarks at: > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ (snip) > Secondly, why do we have to create an additional dm-ioband device for > every device we want to control using rules. This looks little odd > atleast to me. Can't we keep it in line with rest of the controllers > where task grouping takes place using cgroup and rules are specified in > cgroup itself (The way Andrea Righi does for io-throttling patches)? It isn't essential dm-band is implemented as one of the device-mappers. I've been also considering that this algorithm itself can be implemented in the block layer directly. Although, the current implementation has merits. It is flexible. - Dm-ioband can be place anywhere you like, which may be right before the I/O schedulers or may be placed on top of LVM devices. - It supports partition based bandwidth control which can work without cgroups, which is quite easy to use of. - It is independent to any I/O schedulers including ones which will be introduced in the future. I also understand it's will be hard to set up without some tools such as lvm commands. > To avoid creation of stacking another device (dm-ioband) on top of every > device we want to subject to rules, I was thinking of maintaining an > rb-tree per request queue. Requests will first go into this rb-tree upon > __make_request() and then will filter down to elevator associated with the > queue (if there is one). This will provide us the control of releasing > bio's to elevaor based on policies (proportional weight, max bandwidth > etc) and no need of stacking additional block device. I think it's a bit late to control I/O requests there, since process may be blocked in get_request_wait when the I/O load is high. Please imagine the situation that cgroups with low bandwidths are consuming most of "struct request"s while another cgroup with a high bandwidth is blocked and can't get enough "struct request"s. It means cgroups that issues lot of I/O request can win the game. > I am working on some experimental proof of concept patches. It will take > some time though. > > I was thinking of following. > > - Adopt the Andrea Righi's style of specifying rules for devices and > group the tasks using cgroups. > > - To begin with, adopt dm-ioband's approach of proportional bandwidth > controller. It makes sense to me limit the bandwidth usage only in > case of contention. If there is really a need to limit max bandwidth, > then probably we can do something to implement additional rules or > implement some policy switcher where user can decide what kind of > policies need to be implemented. > > - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > request queue which is controlled by some kind of cgroup rules. > > It would be good to discuss above approach now whether it makes sense or > not. I think it is kind of fusion of io-throttling and dm-ioband patches > with additional idea of doing io-control just above elevator on the request > queue using an rb-tree. > > Thanks > Vivek > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-19 11:20 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-19 11:20 UTC (permalink / raw) To: vgoyal Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe Hi, > > Hi All, > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > bandwidth even when it accepts delayed write requests. > > > > In this time, I ran some benchmarks with a high-end storage. The > > reason was to avoid a performance bottleneck due to mechanical factors > > such as seek time. > > > > You can see the details of the benchmarks at: > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ (snip) > Secondly, why do we have to create an additional dm-ioband device for > every device we want to control using rules. This looks little odd > atleast to me. Can't we keep it in line with rest of the controllers > where task grouping takes place using cgroup and rules are specified in > cgroup itself (The way Andrea Righi does for io-throttling patches)? It isn't essential dm-band is implemented as one of the device-mappers. I've been also considering that this algorithm itself can be implemented in the block layer directly. Although, the current implementation has merits. It is flexible. - Dm-ioband can be place anywhere you like, which may be right before the I/O schedulers or may be placed on top of LVM devices. - It supports partition based bandwidth control which can work without cgroups, which is quite easy to use of. - It is independent to any I/O schedulers including ones which will be introduced in the future. I also understand it's will be hard to set up without some tools such as lvm commands. > To avoid creation of stacking another device (dm-ioband) on top of every > device we want to subject to rules, I was thinking of maintaining an > rb-tree per request queue. Requests will first go into this rb-tree upon > __make_request() and then will filter down to elevator associated with the > queue (if there is one). This will provide us the control of releasing > bio's to elevaor based on policies (proportional weight, max bandwidth > etc) and no need of stacking additional block device. I think it's a bit late to control I/O requests there, since process may be blocked in get_request_wait when the I/O load is high. Please imagine the situation that cgroups with low bandwidths are consuming most of "struct request"s while another cgroup with a high bandwidth is blocked and can't get enough "struct request"s. It means cgroups that issues lot of I/O request can win the game. > I am working on some experimental proof of concept patches. It will take > some time though. > > I was thinking of following. > > - Adopt the Andrea Righi's style of specifying rules for devices and > group the tasks using cgroups. > > - To begin with, adopt dm-ioband's approach of proportional bandwidth > controller. It makes sense to me limit the bandwidth usage only in > case of contention. If there is really a need to limit max bandwidth, > then probably we can do something to implement additional rules or > implement some policy switcher where user can decide what kind of > policies need to be implemented. > > - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > request queue which is controlled by some kind of cgroup rules. > > It would be good to discuss above approach now whether it makes sense or > not. I think it is kind of fusion of io-throttling and dm-ioband patches > with additional idea of doing io-control just above elevator on the request > queue using an rb-tree. > > Thanks > Vivek > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080919.202031.86647893.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080919.202031.86647893.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> @ 2008-09-19 13:10 ` Vivek Goyal 0 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-19 13:10 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 On Fri, Sep 19, 2008 at 08:20:31PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > Hi All, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > bandwidth even when it accepts delayed write requests. > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > reason was to avoid a performance bottleneck due to mechanical factors > > > such as seek time. > > > > > > You can see the details of the benchmarks at: > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > (snip) > > > Secondly, why do we have to create an additional dm-ioband device for > > every device we want to control using rules. This looks little odd > > atleast to me. Can't we keep it in line with rest of the controllers > > where task grouping takes place using cgroup and rules are specified in > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > It isn't essential dm-band is implemented as one of the device-mappers. > I've been also considering that this algorithm itself can be implemented > in the block layer directly. > > Although, the current implementation has merits. It is flexible. > - Dm-ioband can be place anywhere you like, which may be right before > the I/O schedulers or may be placed on top of LVM devices. Hi, An rb-tree per request queue also should be able to give us this flexibility. Because logic is implemented per request queue, rules can be placed at any layer. Either at bottom most layer where requests are passed to elevator or at higher layer where requests will be passed to lower level block devices in the stack. Just that we shall have to do modifications to some of the higher level dm/md drivers to make use of queuing cgroup requests and releasing cgroup requests to lower layers. > - It supports partition based bandwidth control which can work without > cgroups, which is quite easy to use of. > - It is independent to any I/O schedulers including ones which will > be introduced in the future. This scheme should also be independent of any of the IO schedulers. We might have to do small changes in IO-schedulers to decouple the things from __make_request() a bit to insert rb-tree in between __make_request() and IO-scheduler. Otherwise fundamentally, this approach should not require any major modifications to IO-schedulers. > > I also understand it's will be hard to set up without some tools > such as lvm commands. > That's something I wish to avoid. If we can keep it simple by doing grouping using cgroup and allow one line rules in cgroup it would be nice. > > To avoid creation of stacking another device (dm-ioband) on top of every > > device we want to subject to rules, I was thinking of maintaining an > > rb-tree per request queue. Requests will first go into this rb-tree upon > > __make_request() and then will filter down to elevator associated with the > > queue (if there is one). This will provide us the control of releasing > > bio's to elevaor based on policies (proportional weight, max bandwidth > > etc) and no need of stacking additional block device. > > I think it's a bit late to control I/O requests there, since process > may be blocked in get_request_wait when the I/O load is high. > Please imagine the situation that cgroups with low bandwidths are > consuming most of "struct request"s while another cgroup with a high > bandwidth is blocked and can't get enough "struct request"s. > > It means cgroups that issues lot of I/O request can win the game. > Ok, this is a good point. Because number of struct requests are limited and they seem to be allocated on first come first serve basis, so if a cgroup is generating lot of IO, then it might win. But dm-ioband will face the same issue. Essentially it is also a request queue and it will have limited number of request descriptors. Have you modified the logic somewhere for allocation of request descriptors to the waiting processes based on their weights? If yes, the logic probably can be implemented here too. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 11:20 ` Hirokazu Takahashi (?) (?) @ 2008-09-19 13:10 ` Vivek Goyal 2008-09-19 20:28 ` Andrea Righi ` (3 more replies) -1 siblings, 4 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-19 13:10 UTC (permalink / raw) To: Hirokazu Takahashi Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe On Fri, Sep 19, 2008 at 08:20:31PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > Hi All, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > bandwidth even when it accepts delayed write requests. > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > reason was to avoid a performance bottleneck due to mechanical factors > > > such as seek time. > > > > > > You can see the details of the benchmarks at: > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > (snip) > > > Secondly, why do we have to create an additional dm-ioband device for > > every device we want to control using rules. This looks little odd > > atleast to me. Can't we keep it in line with rest of the controllers > > where task grouping takes place using cgroup and rules are specified in > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > It isn't essential dm-band is implemented as one of the device-mappers. > I've been also considering that this algorithm itself can be implemented > in the block layer directly. > > Although, the current implementation has merits. It is flexible. > - Dm-ioband can be place anywhere you like, which may be right before > the I/O schedulers or may be placed on top of LVM devices. Hi, An rb-tree per request queue also should be able to give us this flexibility. Because logic is implemented per request queue, rules can be placed at any layer. Either at bottom most layer where requests are passed to elevator or at higher layer where requests will be passed to lower level block devices in the stack. Just that we shall have to do modifications to some of the higher level dm/md drivers to make use of queuing cgroup requests and releasing cgroup requests to lower layers. > - It supports partition based bandwidth control which can work without > cgroups, which is quite easy to use of. > - It is independent to any I/O schedulers including ones which will > be introduced in the future. This scheme should also be independent of any of the IO schedulers. We might have to do small changes in IO-schedulers to decouple the things from __make_request() a bit to insert rb-tree in between __make_request() and IO-scheduler. Otherwise fundamentally, this approach should not require any major modifications to IO-schedulers. > > I also understand it's will be hard to set up without some tools > such as lvm commands. > That's something I wish to avoid. If we can keep it simple by doing grouping using cgroup and allow one line rules in cgroup it would be nice. > > To avoid creation of stacking another device (dm-ioband) on top of every > > device we want to subject to rules, I was thinking of maintaining an > > rb-tree per request queue. Requests will first go into this rb-tree upon > > __make_request() and then will filter down to elevator associated with the > > queue (if there is one). This will provide us the control of releasing > > bio's to elevaor based on policies (proportional weight, max bandwidth > > etc) and no need of stacking additional block device. > > I think it's a bit late to control I/O requests there, since process > may be blocked in get_request_wait when the I/O load is high. > Please imagine the situation that cgroups with low bandwidths are > consuming most of "struct request"s while another cgroup with a high > bandwidth is blocked and can't get enough "struct request"s. > > It means cgroups that issues lot of I/O request can win the game. > Ok, this is a good point. Because number of struct requests are limited and they seem to be allocated on first come first serve basis, so if a cgroup is generating lot of IO, then it might win. But dm-ioband will face the same issue. Essentially it is also a request queue and it will have limited number of request descriptors. Have you modified the logic somewhere for allocation of request descriptors to the waiting processes based on their weights? If yes, the logic probably can be implemented here too. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 13:10 ` Vivek Goyal @ 2008-09-19 20:28 ` Andrea Righi 2008-09-22 9:36 ` Hirokazu Takahashi ` (2 subsequent siblings) 3 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-19 20:28 UTC (permalink / raw) To: Vivek Goyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, Hirokazu Takahashi, dm-devel, agk, xemul, fernando, balbir Vivek Goyal wrote: > On Fri, Sep 19, 2008 at 08:20:31PM +0900, Hirokazu Takahashi wrote: >>> To avoid creation of stacking another device (dm-ioband) on top of every >>> device we want to subject to rules, I was thinking of maintaining an >>> rb-tree per request queue. Requests will first go into this rb-tree upon >>> __make_request() and then will filter down to elevator associated with the >>> queue (if there is one). This will provide us the control of releasing >>> bio's to elevaor based on policies (proportional weight, max bandwidth >>> etc) and no need of stacking additional block device. >> I think it's a bit late to control I/O requests there, since process >> may be blocked in get_request_wait when the I/O load is high. >> Please imagine the situation that cgroups with low bandwidths are >> consuming most of "struct request"s while another cgroup with a high >> bandwidth is blocked and can't get enough "struct request"s. >> >> It means cgroups that issues lot of I/O request can win the game. >> > > Ok, this is a good point. Because number of struct requests are limited > and they seem to be allocated on first come first serve basis, so if a > cgroup is generating lot of IO, then it might win. > > But dm-ioband will face the same issue. Essentially it is also a request > queue and it will have limited number of request descriptors. Have you > modified the logic somewhere for allocation of request descriptors to the > waiting processes based on their weights? If yes, the logic probably can > be implemented here too. Maybe throttling dirty page ratio in memory could help to avoid this problem. I mean, if a cgroup is exceeding the i/o limits do ehm... something.. also at the balance_dirty_pages() level. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 13:10 ` Vivek Goyal @ 2008-09-22 9:36 ` Hirokazu Takahashi 2008-09-22 9:36 ` Hirokazu Takahashi ` (2 subsequent siblings) 3 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-22 9:36 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, ryov, xemul, fernando, balbir Hi, > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > such as seek time. > > > > > > > > You can see the details of the benchmarks at: > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > (snip) > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > every device we want to control using rules. This looks little odd > > > atleast to me. Can't we keep it in line with rest of the controllers > > > where task grouping takes place using cgroup and rules are specified in > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > I've been also considering that this algorithm itself can be implemented > > in the block layer directly. > > > > Although, the current implementation has merits. It is flexible. > > - Dm-ioband can be place anywhere you like, which may be right before > > the I/O schedulers or may be placed on top of LVM devices. > > Hi, > > An rb-tree per request queue also should be able to give us this > flexibility. Because logic is implemented per request queue, rules can be > placed at any layer. Either at bottom most layer where requests are > passed to elevator or at higher layer where requests will be passed to > lower level block devices in the stack. Just that we shall have to do > modifications to some of the higher level dm/md drivers to make use of > queuing cgroup requests and releasing cgroup requests to lower layers. Request descriptors are allocated just right before passing I/O requests to the elevators. Even if you move the descriptor allocation point before calling the dm/md drivers, the drivers can't make use of them. When one of the dm drivers accepts a I/O request, the request won't have either a real device number or a real sector number. The request will be re-mapped to another sector of another device in every dm drivers. The request may even be replicated there. So it is really hard to find the right request queue to put the request into and sort them on the queue. > > - It supports partition based bandwidth control which can work without > > cgroups, which is quite easy to use of. > > > - It is independent to any I/O schedulers including ones which will > > be introduced in the future. > > This scheme should also be independent of any of the IO schedulers. We > might have to do small changes in IO-schedulers to decouple the things > from __make_request() a bit to insert rb-tree in between __make_request() > and IO-scheduler. Otherwise fundamentally, this approach should not > require any major modifications to IO-schedulers. > > > > > I also understand it's will be hard to set up without some tools > > such as lvm commands. > > > > That's something I wish to avoid. If we can keep it simple by doing > grouping using cgroup and allow one line rules in cgroup it would be nice. It's possible the algorithm of dm-ioband can be placed in the block layer if it is really a big problem. But I doubt it can control every control block I/O as we wish since the interface the cgroup supports is quite poor. > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > device we want to subject to rules, I was thinking of maintaining an > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > __make_request() and then will filter down to elevator associated with the > > > queue (if there is one). This will provide us the control of releasing > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > etc) and no need of stacking additional block device. > > > > I think it's a bit late to control I/O requests there, since process > > may be blocked in get_request_wait when the I/O load is high. > > Please imagine the situation that cgroups with low bandwidths are > > consuming most of "struct request"s while another cgroup with a high > > bandwidth is blocked and can't get enough "struct request"s. > > > > It means cgroups that issues lot of I/O request can win the game. > > > > Ok, this is a good point. Because number of struct requests are limited > and they seem to be allocated on first come first serve basis, so if a > cgroup is generating lot of IO, then it might win. > > But dm-ioband will face the same issue. Nope. Dm-ioband doesn't have this issue since it works before allocating the descriptors. Only I/O requests dm-ioband has passed can allocate its descriptor. > Essentially it is also a request > queue and it will have limited number of request descriptors. Have you > modified the logic somewhere for allocation of request descriptors to the > waiting processes based on their weights? If yes, the logic probably can > be implemented here too. I feel this is almost what dm-ioband is doing. > Thanks > Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-22 9:36 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-22 9:36 UTC (permalink / raw) To: vgoyal Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe Hi, > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > such as seek time. > > > > > > > > You can see the details of the benchmarks at: > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > (snip) > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > every device we want to control using rules. This looks little odd > > > atleast to me. Can't we keep it in line with rest of the controllers > > > where task grouping takes place using cgroup and rules are specified in > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > I've been also considering that this algorithm itself can be implemented > > in the block layer directly. > > > > Although, the current implementation has merits. It is flexible. > > - Dm-ioband can be place anywhere you like, which may be right before > > the I/O schedulers or may be placed on top of LVM devices. > > Hi, > > An rb-tree per request queue also should be able to give us this > flexibility. Because logic is implemented per request queue, rules can be > placed at any layer. Either at bottom most layer where requests are > passed to elevator or at higher layer where requests will be passed to > lower level block devices in the stack. Just that we shall have to do > modifications to some of the higher level dm/md drivers to make use of > queuing cgroup requests and releasing cgroup requests to lower layers. Request descriptors are allocated just right before passing I/O requests to the elevators. Even if you move the descriptor allocation point before calling the dm/md drivers, the drivers can't make use of them. When one of the dm drivers accepts a I/O request, the request won't have either a real device number or a real sector number. The request will be re-mapped to another sector of another device in every dm drivers. The request may even be replicated there. So it is really hard to find the right request queue to put the request into and sort them on the queue. > > - It supports partition based bandwidth control which can work without > > cgroups, which is quite easy to use of. > > > - It is independent to any I/O schedulers including ones which will > > be introduced in the future. > > This scheme should also be independent of any of the IO schedulers. We > might have to do small changes in IO-schedulers to decouple the things > from __make_request() a bit to insert rb-tree in between __make_request() > and IO-scheduler. Otherwise fundamentally, this approach should not > require any major modifications to IO-schedulers. > > > > > I also understand it's will be hard to set up without some tools > > such as lvm commands. > > > > That's something I wish to avoid. If we can keep it simple by doing > grouping using cgroup and allow one line rules in cgroup it would be nice. It's possible the algorithm of dm-ioband can be placed in the block layer if it is really a big problem. But I doubt it can control every control block I/O as we wish since the interface the cgroup supports is quite poor. > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > device we want to subject to rules, I was thinking of maintaining an > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > __make_request() and then will filter down to elevator associated with the > > > queue (if there is one). This will provide us the control of releasing > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > etc) and no need of stacking additional block device. > > > > I think it's a bit late to control I/O requests there, since process > > may be blocked in get_request_wait when the I/O load is high. > > Please imagine the situation that cgroups with low bandwidths are > > consuming most of "struct request"s while another cgroup with a high > > bandwidth is blocked and can't get enough "struct request"s. > > > > It means cgroups that issues lot of I/O request can win the game. > > > > Ok, this is a good point. Because number of struct requests are limited > and they seem to be allocated on first come first serve basis, so if a > cgroup is generating lot of IO, then it might win. > > But dm-ioband will face the same issue. Nope. Dm-ioband doesn't have this issue since it works before allocating the descriptors. Only I/O requests dm-ioband has passed can allocate its descriptor. > Essentially it is also a request > queue and it will have limited number of request descriptors. Have you > modified the logic somewhere for allocation of request descriptors to the > waiting processes based on their weights? If yes, the logic probably can > be implemented here too. I feel this is almost what dm-ioband is doing. > Thanks > Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080922.183651.62951479.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080922.183651.62951479.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> @ 2008-09-22 14:30 ` Vivek Goyal 0 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-22 14:30 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 On Mon, Sep 22, 2008 at 06:36:51PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > > such as seek time. > > > > > > > > > > You can see the details of the benchmarks at: > > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > (snip) > > > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > > every device we want to control using rules. This looks little odd > > > > atleast to me. Can't we keep it in line with rest of the controllers > > > > where task grouping takes place using cgroup and rules are specified in > > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > > I've been also considering that this algorithm itself can be implemented > > > in the block layer directly. > > > > > > Although, the current implementation has merits. It is flexible. > > > - Dm-ioband can be place anywhere you like, which may be right before > > > the I/O schedulers or may be placed on top of LVM devices. > > > > Hi, > > > > An rb-tree per request queue also should be able to give us this > > flexibility. Because logic is implemented per request queue, rules can be > > placed at any layer. Either at bottom most layer where requests are > > passed to elevator or at higher layer where requests will be passed to > > lower level block devices in the stack. Just that we shall have to do > > modifications to some of the higher level dm/md drivers to make use of > > queuing cgroup requests and releasing cgroup requests to lower layers. > > Request descriptors are allocated just right before passing I/O requests > to the elevators. Even if you move the descriptor allocation point > before calling the dm/md drivers, the drivers can't make use of them. > You are right. request descriptors are currently allocated at bottom most layer. Anyway, in the rb-tree, we put bio cgroups as logical elements and every bio cgroup then contains the list of either bios or requeust descriptors. So what kind of list bio-cgroup maintains can depend on whether it is a higher layer driver (will maintain bios) or a lower layer driver (will maintain list of request descriptors per bio-cgroup). So basically mechanism of maintaining an rb-tree can be completely ignorant of the fact whether a driver is keeping track of bios or keeping track of requests per cgroup. > When one of the dm drivers accepts a I/O request, the request > won't have either a real device number or a real sector number. > The request will be re-mapped to another sector of another device > in every dm drivers. The request may even be replicated there. > So it is really hard to find the right request queue to put > the request into and sort them on the queue. Hmm.., I thought that all the incoming requests to dm/md driver will remain in a single queue maintained by that drvier (irrespective of the fact in which request queue these requests go in lower layers after replication or other operation). I am not very familiar with dm/md implementation. I will read more about it.... > > > > - It supports partition based bandwidth control which can work without > > > cgroups, which is quite easy to use of. > > > > > - It is independent to any I/O schedulers including ones which will > > > be introduced in the future. > > > > This scheme should also be independent of any of the IO schedulers. We > > might have to do small changes in IO-schedulers to decouple the things > > from __make_request() a bit to insert rb-tree in between __make_request() > > and IO-scheduler. Otherwise fundamentally, this approach should not > > require any major modifications to IO-schedulers. > > > > > > > > I also understand it's will be hard to set up without some tools > > > such as lvm commands. > > > > > > > That's something I wish to avoid. If we can keep it simple by doing > > grouping using cgroup and allow one line rules in cgroup it would be nice. > > It's possible the algorithm of dm-ioband can be placed in the block layer > if it is really a big problem. > But I doubt it can control every control block I/O as we wish since > the interface the cgroup supports is quite poor. Had a question regarding cgroup interface. I am assuming that in a system, one will be using other controllers as well apart from IO-controller. Other controllers will be using cgroup as a grouping mechanism. Now coming up with additional grouping mechanism for only io-controller seems little odd to me. It will make the job of higher level management software harder. Looking at the dm-ioband grouping examples given in patches, I think cases of grouping based in pid, pgrp, uid and kvm can be handled by creating right cgroup and making sure applications are launched/moved into right cgroup by user space tools. I think keeping grouping mechanism in line with rest of the controllers should help because a uniform grouping mechanism should make life simpler. I am not very sure about moving dm-ioband algorithm in block layer. Looks like it will make life simpler at least in terms of configuration. > > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > > device we want to subject to rules, I was thinking of maintaining an > > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > > __make_request() and then will filter down to elevator associated with the > > > > queue (if there is one). This will provide us the control of releasing > > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > > etc) and no need of stacking additional block device. > > > > > > I think it's a bit late to control I/O requests there, since process > > > may be blocked in get_request_wait when the I/O load is high. > > > Please imagine the situation that cgroups with low bandwidths are > > > consuming most of "struct request"s while another cgroup with a high > > > bandwidth is blocked and can't get enough "struct request"s. > > > > > > It means cgroups that issues lot of I/O request can win the game. > > > > > > > Ok, this is a good point. Because number of struct requests are limited > > and they seem to be allocated on first come first serve basis, so if a > > cgroup is generating lot of IO, then it might win. > > > > But dm-ioband will face the same issue. > > Nope. Dm-ioband doesn't have this issue since it works before allocating > the descriptors. Only I/O requests dm-ioband has passed can allocate its > descriptor. > Ok. Got it. dm-ioband does not block on allocation of request descriptors. It does seem to be blocking in prevent_burst_bios() but that would be per group so it should be fine. That means for lower layers, one shall have to do request descritor allocation as per the cgroup weight to make sure a cgroup with lower weight does not get higher % of disk because it is generating more requests. One additional issue with my scheme I just noticed is that I am putting bio-cgroup in rb-tree. If there are stacked devices then bio/requests from same cgroup can be at multiple levels of processing at same time. That would mean that a single cgroup needs to be in multiple rb-trees at the same time in various layers. So I might have to create a temporary object which can associate with cgroup and get rid of that object once I don't have the requests any more... Well, implementing rb-tree per request queue seems to be harder than I had thought. Especially taking care of decoupling the elevator and reqeust descriptor logic at lower layers. Long way to go.. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-22 9:36 ` Hirokazu Takahashi (?) (?) @ 2008-09-22 14:30 ` Vivek Goyal -1 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-22 14:30 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir On Mon, Sep 22, 2008 at 06:36:51PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > > such as seek time. > > > > > > > > > > You can see the details of the benchmarks at: > > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > (snip) > > > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > > every device we want to control using rules. This looks little odd > > > > atleast to me. Can't we keep it in line with rest of the controllers > > > > where task grouping takes place using cgroup and rules are specified in > > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > > I've been also considering that this algorithm itself can be implemented > > > in the block layer directly. > > > > > > Although, the current implementation has merits. It is flexible. > > > - Dm-ioband can be place anywhere you like, which may be right before > > > the I/O schedulers or may be placed on top of LVM devices. > > > > Hi, > > > > An rb-tree per request queue also should be able to give us this > > flexibility. Because logic is implemented per request queue, rules can be > > placed at any layer. Either at bottom most layer where requests are > > passed to elevator or at higher layer where requests will be passed to > > lower level block devices in the stack. Just that we shall have to do > > modifications to some of the higher level dm/md drivers to make use of > > queuing cgroup requests and releasing cgroup requests to lower layers. > > Request descriptors are allocated just right before passing I/O requests > to the elevators. Even if you move the descriptor allocation point > before calling the dm/md drivers, the drivers can't make use of them. > You are right. request descriptors are currently allocated at bottom most layer. Anyway, in the rb-tree, we put bio cgroups as logical elements and every bio cgroup then contains the list of either bios or requeust descriptors. So what kind of list bio-cgroup maintains can depend on whether it is a higher layer driver (will maintain bios) or a lower layer driver (will maintain list of request descriptors per bio-cgroup). So basically mechanism of maintaining an rb-tree can be completely ignorant of the fact whether a driver is keeping track of bios or keeping track of requests per cgroup. > When one of the dm drivers accepts a I/O request, the request > won't have either a real device number or a real sector number. > The request will be re-mapped to another sector of another device > in every dm drivers. The request may even be replicated there. > So it is really hard to find the right request queue to put > the request into and sort them on the queue. Hmm.., I thought that all the incoming requests to dm/md driver will remain in a single queue maintained by that drvier (irrespective of the fact in which request queue these requests go in lower layers after replication or other operation). I am not very familiar with dm/md implementation. I will read more about it.... > > > > - It supports partition based bandwidth control which can work without > > > cgroups, which is quite easy to use of. > > > > > - It is independent to any I/O schedulers including ones which will > > > be introduced in the future. > > > > This scheme should also be independent of any of the IO schedulers. We > > might have to do small changes in IO-schedulers to decouple the things > > from __make_request() a bit to insert rb-tree in between __make_request() > > and IO-scheduler. Otherwise fundamentally, this approach should not > > require any major modifications to IO-schedulers. > > > > > > > > I also understand it's will be hard to set up without some tools > > > such as lvm commands. > > > > > > > That's something I wish to avoid. If we can keep it simple by doing > > grouping using cgroup and allow one line rules in cgroup it would be nice. > > It's possible the algorithm of dm-ioband can be placed in the block layer > if it is really a big problem. > But I doubt it can control every control block I/O as we wish since > the interface the cgroup supports is quite poor. Had a question regarding cgroup interface. I am assuming that in a system, one will be using other controllers as well apart from IO-controller. Other controllers will be using cgroup as a grouping mechanism. Now coming up with additional grouping mechanism for only io-controller seems little odd to me. It will make the job of higher level management software harder. Looking at the dm-ioband grouping examples given in patches, I think cases of grouping based in pid, pgrp, uid and kvm can be handled by creating right cgroup and making sure applications are launched/moved into right cgroup by user space tools. I think keeping grouping mechanism in line with rest of the controllers should help because a uniform grouping mechanism should make life simpler. I am not very sure about moving dm-ioband algorithm in block layer. Looks like it will make life simpler at least in terms of configuration. > > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > > device we want to subject to rules, I was thinking of maintaining an > > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > > __make_request() and then will filter down to elevator associated with the > > > > queue (if there is one). This will provide us the control of releasing > > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > > etc) and no need of stacking additional block device. > > > > > > I think it's a bit late to control I/O requests there, since process > > > may be blocked in get_request_wait when the I/O load is high. > > > Please imagine the situation that cgroups with low bandwidths are > > > consuming most of "struct request"s while another cgroup with a high > > > bandwidth is blocked and can't get enough "struct request"s. > > > > > > It means cgroups that issues lot of I/O request can win the game. > > > > > > > Ok, this is a good point. Because number of struct requests are limited > > and they seem to be allocated on first come first serve basis, so if a > > cgroup is generating lot of IO, then it might win. > > > > But dm-ioband will face the same issue. > > Nope. Dm-ioband doesn't have this issue since it works before allocating > the descriptors. Only I/O requests dm-ioband has passed can allocate its > descriptor. > Ok. Got it. dm-ioband does not block on allocation of request descriptors. It does seem to be blocking in prevent_burst_bios() but that would be per group so it should be fine. That means for lower layers, one shall have to do request descritor allocation as per the cgroup weight to make sure a cgroup with lower weight does not get higher % of disk because it is generating more requests. One additional issue with my scheme I just noticed is that I am putting bio-cgroup in rb-tree. If there are stacked devices then bio/requests from same cgroup can be at multiple levels of processing at same time. That would mean that a single cgroup needs to be in multiple rb-trees at the same time in various layers. So I might have to create a temporary object which can associate with cgroup and get rid of that object once I don't have the requests any more... Well, implementing rb-tree per request queue seems to be harder than I had thought. Especially taking care of decoupling the elevator and reqeust descriptor logic at lower layers. Long way to go.. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-22 9:36 ` Hirokazu Takahashi ` (2 preceding siblings ...) (?) @ 2008-09-22 14:30 ` Vivek Goyal 2008-09-24 8:29 ` Hirokazu Takahashi ` (6 more replies) -1 siblings, 7 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-22 14:30 UTC (permalink / raw) To: Hirokazu Takahashi Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe On Mon, Sep 22, 2008 at 06:36:51PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > > such as seek time. > > > > > > > > > > You can see the details of the benchmarks at: > > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > (snip) > > > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > > every device we want to control using rules. This looks little odd > > > > atleast to me. Can't we keep it in line with rest of the controllers > > > > where task grouping takes place using cgroup and rules are specified in > > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > > I've been also considering that this algorithm itself can be implemented > > > in the block layer directly. > > > > > > Although, the current implementation has merits. It is flexible. > > > - Dm-ioband can be place anywhere you like, which may be right before > > > the I/O schedulers or may be placed on top of LVM devices. > > > > Hi, > > > > An rb-tree per request queue also should be able to give us this > > flexibility. Because logic is implemented per request queue, rules can be > > placed at any layer. Either at bottom most layer where requests are > > passed to elevator or at higher layer where requests will be passed to > > lower level block devices in the stack. Just that we shall have to do > > modifications to some of the higher level dm/md drivers to make use of > > queuing cgroup requests and releasing cgroup requests to lower layers. > > Request descriptors are allocated just right before passing I/O requests > to the elevators. Even if you move the descriptor allocation point > before calling the dm/md drivers, the drivers can't make use of them. > You are right. request descriptors are currently allocated at bottom most layer. Anyway, in the rb-tree, we put bio cgroups as logical elements and every bio cgroup then contains the list of either bios or requeust descriptors. So what kind of list bio-cgroup maintains can depend on whether it is a higher layer driver (will maintain bios) or a lower layer driver (will maintain list of request descriptors per bio-cgroup). So basically mechanism of maintaining an rb-tree can be completely ignorant of the fact whether a driver is keeping track of bios or keeping track of requests per cgroup. > When one of the dm drivers accepts a I/O request, the request > won't have either a real device number or a real sector number. > The request will be re-mapped to another sector of another device > in every dm drivers. The request may even be replicated there. > So it is really hard to find the right request queue to put > the request into and sort them on the queue. Hmm.., I thought that all the incoming requests to dm/md driver will remain in a single queue maintained by that drvier (irrespective of the fact in which request queue these requests go in lower layers after replication or other operation). I am not very familiar with dm/md implementation. I will read more about it.... > > > > - It supports partition based bandwidth control which can work without > > > cgroups, which is quite easy to use of. > > > > > - It is independent to any I/O schedulers including ones which will > > > be introduced in the future. > > > > This scheme should also be independent of any of the IO schedulers. We > > might have to do small changes in IO-schedulers to decouple the things > > from __make_request() a bit to insert rb-tree in between __make_request() > > and IO-scheduler. Otherwise fundamentally, this approach should not > > require any major modifications to IO-schedulers. > > > > > > > > I also understand it's will be hard to set up without some tools > > > such as lvm commands. > > > > > > > That's something I wish to avoid. If we can keep it simple by doing > > grouping using cgroup and allow one line rules in cgroup it would be nice. > > It's possible the algorithm of dm-ioband can be placed in the block layer > if it is really a big problem. > But I doubt it can control every control block I/O as we wish since > the interface the cgroup supports is quite poor. Had a question regarding cgroup interface. I am assuming that in a system, one will be using other controllers as well apart from IO-controller. Other controllers will be using cgroup as a grouping mechanism. Now coming up with additional grouping mechanism for only io-controller seems little odd to me. It will make the job of higher level management software harder. Looking at the dm-ioband grouping examples given in patches, I think cases of grouping based in pid, pgrp, uid and kvm can be handled by creating right cgroup and making sure applications are launched/moved into right cgroup by user space tools. I think keeping grouping mechanism in line with rest of the controllers should help because a uniform grouping mechanism should make life simpler. I am not very sure about moving dm-ioband algorithm in block layer. Looks like it will make life simpler at least in terms of configuration. > > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > > device we want to subject to rules, I was thinking of maintaining an > > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > > __make_request() and then will filter down to elevator associated with the > > > > queue (if there is one). This will provide us the control of releasing > > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > > etc) and no need of stacking additional block device. > > > > > > I think it's a bit late to control I/O requests there, since process > > > may be blocked in get_request_wait when the I/O load is high. > > > Please imagine the situation that cgroups with low bandwidths are > > > consuming most of "struct request"s while another cgroup with a high > > > bandwidth is blocked and can't get enough "struct request"s. > > > > > > It means cgroups that issues lot of I/O request can win the game. > > > > > > > Ok, this is a good point. Because number of struct requests are limited > > and they seem to be allocated on first come first serve basis, so if a > > cgroup is generating lot of IO, then it might win. > > > > But dm-ioband will face the same issue. > > Nope. Dm-ioband doesn't have this issue since it works before allocating > the descriptors. Only I/O requests dm-ioband has passed can allocate its > descriptor. > Ok. Got it. dm-ioband does not block on allocation of request descriptors. It does seem to be blocking in prevent_burst_bios() but that would be per group so it should be fine. That means for lower layers, one shall have to do request descritor allocation as per the cgroup weight to make sure a cgroup with lower weight does not get higher % of disk because it is generating more requests. One additional issue with my scheme I just noticed is that I am putting bio-cgroup in rb-tree. If there are stacked devices then bio/requests from same cgroup can be at multiple levels of processing at same time. That would mean that a single cgroup needs to be in multiple rb-trees at the same time in various layers. So I might have to create a temporary object which can associate with cgroup and get rid of that object once I don't have the requests any more... Well, implementing rb-tree per request queue seems to be harder than I had thought. Especially taking care of decoupling the elevator and reqeust descriptor logic at lower layers. Long way to go.. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-22 14:30 ` Vivek Goyal @ 2008-09-24 8:29 ` Hirokazu Takahashi 2008-09-24 8:29 ` Hirokazu Takahashi ` (5 subsequent siblings) 6 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-24 8:29 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir Hi, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > > > such as seek time. > > > > > > > > > > > > You can see the details of the benchmarks at: > > > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > > > (snip) > > > > > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > > > every device we want to control using rules. This looks little odd > > > > > atleast to me. Can't we keep it in line with rest of the controllers > > > > > where task grouping takes place using cgroup and rules are specified in > > > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > > > I've been also considering that this algorithm itself can be implemented > > > > in the block layer directly. > > > > > > > > Although, the current implementation has merits. It is flexible. > > > > - Dm-ioband can be place anywhere you like, which may be right before > > > > the I/O schedulers or may be placed on top of LVM devices. > > > > > > Hi, > > > > > > An rb-tree per request queue also should be able to give us this > > > flexibility. Because logic is implemented per request queue, rules can be > > > placed at any layer. Either at bottom most layer where requests are > > > passed to elevator or at higher layer where requests will be passed to > > > lower level block devices in the stack. Just that we shall have to do > > > modifications to some of the higher level dm/md drivers to make use of > > > queuing cgroup requests and releasing cgroup requests to lower layers. > > > > Request descriptors are allocated just right before passing I/O requests > > to the elevators. Even if you move the descriptor allocation point > > before calling the dm/md drivers, the drivers can't make use of them. > > > > You are right. request descriptors are currently allocated at bottom > most layer. Anyway, in the rb-tree, we put bio cgroups as logical elements > and every bio cgroup then contains the list of either bios or requeust > descriptors. So what kind of list bio-cgroup maintains can depend on > whether it is a higher layer driver (will maintain bios) or a lower layer > driver (will maintain list of request descriptors per bio-cgroup). I'm getting confused about your idea. I thought you wanted to make each cgroup have its own rb-tree, and wanted to make all the layers share the same rb-tree. If so, are you going to put different things into the same tree? Do you even want all the I/O schedlers use the same tree? Are you going to block request descriptors in the tree? From the view point of performance, all the request descriptors should be passed to the I/O schedulers, since the maximum number of request descriptors is limited. And I still don't understand if you want to make your rb-tree work efficiently, you need to put a lot of bios or request descriptors into the tree. Is that what you are going to do? On the other hand, dm-ioband tries to minimize to have bios blocked. And I have a plan on reducing the maximum number that can be blocked there. Sorry to bother you that I just don't understand the concept clearly. > So basically mechanism of maintaining an rb-tree can be completely > ignorant of the fact whether a driver is keeping track of bios or keeping > track of requests per cgroup. I don't care whether the queue is implemented as a rb-tee or some kind of list because they are logically the same thing. > > When one of the dm drivers accepts a I/O request, the request > > won't have either a real device number or a real sector number. > > The request will be re-mapped to another sector of another device > > in every dm drivers. The request may even be replicated there. > > So it is really hard to find the right request queue to put > > the request into and sort them on the queue. > > Hmm.., I thought that all the incoming requests to dm/md driver will > remain in a single queue maintained by that drvier (irrespective of the > fact in which request queue these requests go in lower layers after > replication or other operation). I am not very familiar with dm/md > implementation. I will read more about it.... They never look into the queues maintained in drivers. Some of them have its own little queue and others don't. Some may just modify the sector numbers of I/O requests or may create a new I/O request themselves. Others such as md-raid5 have their own queues to control I/Os, where A write request may cause several read requests and have to wait for their completions before the actual write starts. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-22 14:30 ` Vivek Goyal @ 2008-09-24 8:29 ` Hirokazu Takahashi 2008-09-24 8:29 ` Hirokazu Takahashi ` (5 subsequent siblings) 6 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-24 8:29 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir Hi, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > > > such as seek time. > > > > > > > > > > > > You can see the details of the benchmarks at: > > > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > > > (snip) > > > > > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > > > every device we want to control using rules. This looks little odd > > > > > atleast to me. Can't we keep it in line with rest of the controllers > > > > > where task grouping takes place using cgroup and rules are specified in > > > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > > > I've been also considering that this algorithm itself can be implemented > > > > in the block layer directly. > > > > > > > > Although, the current implementation has merits. It is flexible. > > > > - Dm-ioband can be place anywhere you like, which may be right before > > > > the I/O schedulers or may be placed on top of LVM devices. > > > > > > Hi, > > > > > > An rb-tree per request queue also should be able to give us this > > > flexibility. Because logic is implemented per request queue, rules can be > > > placed at any layer. Either at bottom most layer where requests are > > > passed to elevator or at higher layer where requests will be passed to > > > lower level block devices in the stack. Just that we shall have to do > > > modifications to some of the higher level dm/md drivers to make use of > > > queuing cgroup requests and releasing cgroup requests to lower layers. > > > > Request descriptors are allocated just right before passing I/O requests > > to the elevators. Even if you move the descriptor allocation point > > before calling the dm/md drivers, the drivers can't make use of them. > > > > You are right. request descriptors are currently allocated at bottom > most layer. Anyway, in the rb-tree, we put bio cgroups as logical elements > and every bio cgroup then contains the list of either bios or requeust > descriptors. So what kind of list bio-cgroup maintains can depend on > whether it is a higher layer driver (will maintain bios) or a lower layer > driver (will maintain list of request descriptors per bio-cgroup). I'm getting confused about your idea. I thought you wanted to make each cgroup have its own rb-tree, and wanted to make all the layers share the same rb-tree. If so, are you going to put different things into the same tree? Do you even want all the I/O schedlers use the same tree? Are you going to block request descriptors in the tree? From the view point of performance, all the request descriptors should be passed to the I/O schedulers, since the maximum number of request descriptors is limited. And I still don't understand if you want to make your rb-tree work efficiently, you need to put a lot of bios or request descriptors into the tree. Is that what you are going to do? On the other hand, dm-ioband tries to minimize to have bios blocked. And I have a plan on reducing the maximum number that can be blocked there. Sorry to bother you that I just don't understand the concept clearly. > So basically mechanism of maintaining an rb-tree can be completely > ignorant of the fact whether a driver is keeping track of bios or keeping > track of requests per cgroup. I don't care whether the queue is implemented as a rb-tee or some kind of list because they are logically the same thing. > > When one of the dm drivers accepts a I/O request, the request > > won't have either a real device number or a real sector number. > > The request will be re-mapped to another sector of another device > > in every dm drivers. The request may even be replicated there. > > So it is really hard to find the right request queue to put > > the request into and sort them on the queue. > > Hmm.., I thought that all the incoming requests to dm/md driver will > remain in a single queue maintained by that drvier (irrespective of the > fact in which request queue these requests go in lower layers after > replication or other operation). I am not very familiar with dm/md > implementation. I will read more about it.... They never look into the queues maintained in drivers. Some of them have its own little queue and others don't. Some may just modify the sector numbers of I/O requests or may create a new I/O request themselves. Others such as md-raid5 have their own queues to control I/Os, where A write request may cause several read requests and have to wait for their completions before the actual write starts. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-24 8:29 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-24 8:29 UTC (permalink / raw) To: vgoyal Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe Hi, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > > > such as seek time. > > > > > > > > > > > > You can see the details of the benchmarks at: > > > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > > > (snip) > > > > > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > > > every device we want to control using rules. This looks little odd > > > > > atleast to me. Can't we keep it in line with rest of the controllers > > > > > where task grouping takes place using cgroup and rules are specified in > > > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > > > I've been also considering that this algorithm itself can be implemented > > > > in the block layer directly. > > > > > > > > Although, the current implementation has merits. It is flexible. > > > > - Dm-ioband can be place anywhere you like, which may be right before > > > > the I/O schedulers or may be placed on top of LVM devices. > > > > > > Hi, > > > > > > An rb-tree per request queue also should be able to give us this > > > flexibility. Because logic is implemented per request queue, rules can be > > > placed at any layer. Either at bottom most layer where requests are > > > passed to elevator or at higher layer where requests will be passed to > > > lower level block devices in the stack. Just that we shall have to do > > > modifications to some of the higher level dm/md drivers to make use of > > > queuing cgroup requests and releasing cgroup requests to lower layers. > > > > Request descriptors are allocated just right before passing I/O requests > > to the elevators. Even if you move the descriptor allocation point > > before calling the dm/md drivers, the drivers can't make use of them. > > > > You are right. request descriptors are currently allocated at bottom > most layer. Anyway, in the rb-tree, we put bio cgroups as logical elements > and every bio cgroup then contains the list of either bios or requeust > descriptors. So what kind of list bio-cgroup maintains can depend on > whether it is a higher layer driver (will maintain bios) or a lower layer > driver (will maintain list of request descriptors per bio-cgroup). I'm getting confused about your idea. I thought you wanted to make each cgroup have its own rb-tree, and wanted to make all the layers share the same rb-tree. If so, are you going to put different things into the same tree? Do you even want all the I/O schedlers use the same tree? Are you going to block request descriptors in the tree? >From the view point of performance, all the request descriptors should be passed to the I/O schedulers, since the maximum number of request descriptors is limited. And I still don't understand if you want to make your rb-tree work efficiently, you need to put a lot of bios or request descriptors into the tree. Is that what you are going to do? On the other hand, dm-ioband tries to minimize to have bios blocked. And I have a plan on reducing the maximum number that can be blocked there. Sorry to bother you that I just don't understand the concept clearly. > So basically mechanism of maintaining an rb-tree can be completely > ignorant of the fact whether a driver is keeping track of bios or keeping > track of requests per cgroup. I don't care whether the queue is implemented as a rb-tee or some kind of list because they are logically the same thing. > > When one of the dm drivers accepts a I/O request, the request > > won't have either a real device number or a real sector number. > > The request will be re-mapped to another sector of another device > > in every dm drivers. The request may even be replicated there. > > So it is really hard to find the right request queue to put > > the request into and sort them on the queue. > > Hmm.., I thought that all the incoming requests to dm/md driver will > remain in a single queue maintained by that drvier (irrespective of the > fact in which request queue these requests go in lower layers after > replication or other operation). I am not very familiar with dm/md > implementation. I will read more about it.... They never look into the queues maintained in drivers. Some of them have its own little queue and others don't. Some may just modify the sector numbers of I/O requests or may create a new I/O request themselves. Others such as md-raid5 have their own queues to control I/Os, where A write request may cause several read requests and have to wait for their completions before the actual write starts. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080924.172937.72827863.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080924.172937.72827863.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> @ 2008-09-24 14:03 ` Vivek Goyal 0 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-24 14:03 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 On Wed, Sep 24, 2008 at 05:29:37PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > > > > such as seek time. > > > > > > > > > > > > > > You can see the details of the benchmarks at: > > > > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > > > > > (snip) > > > > > > > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > > > > every device we want to control using rules. This looks little odd > > > > > > atleast to me. Can't we keep it in line with rest of the controllers > > > > > > where task grouping takes place using cgroup and rules are specified in > > > > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > > > > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > > > > I've been also considering that this algorithm itself can be implemented > > > > > in the block layer directly. > > > > > > > > > > Although, the current implementation has merits. It is flexible. > > > > > - Dm-ioband can be place anywhere you like, which may be right before > > > > > the I/O schedulers or may be placed on top of LVM devices. > > > > > > > > Hi, > > > > > > > > An rb-tree per request queue also should be able to give us this > > > > flexibility. Because logic is implemented per request queue, rules can be > > > > placed at any layer. Either at bottom most layer where requests are > > > > passed to elevator or at higher layer where requests will be passed to > > > > lower level block devices in the stack. Just that we shall have to do > > > > modifications to some of the higher level dm/md drivers to make use of > > > > queuing cgroup requests and releasing cgroup requests to lower layers. > > > > > > Request descriptors are allocated just right before passing I/O requests > > > to the elevators. Even if you move the descriptor allocation point > > > before calling the dm/md drivers, the drivers can't make use of them. > > > > > > > You are right. request descriptors are currently allocated at bottom > > most layer. Anyway, in the rb-tree, we put bio cgroups as logical elements > > and every bio cgroup then contains the list of either bios or requeust > > descriptors. So what kind of list bio-cgroup maintains can depend on > > whether it is a higher layer driver (will maintain bios) or a lower layer > > driver (will maintain list of request descriptors per bio-cgroup). > > I'm getting confused about your idea. > > I thought you wanted to make each cgroup have its own rb-tree, > and wanted to make all the layers share the same rb-tree. > If so, are you going to put different things into the same tree? > Do you even want all the I/O schedlers use the same tree? > Ok, I will give more details of the thought process. I was thinking of maintaing an rb-tree per request queue and not an rb-tree per cgroup. This tree can contain all the bios submitted to that request queue through __make_request(). Every node in the tree will represent one cgroup and will contain a list of bios issued from the tasks from that cgroup. Every bio entering the request queue through __make_request() function first will be queued in one of the nodes in this rb-tree, depending on which cgroup that bio belongs to. Once the bios are buffered in rb-tree, we release these to underlying elevator depending on the proportionate weight of the nodes/cgroups. Some more details which I was trying to implement yesterday. There will be one bio_cgroup object per cgroup. This object will contain many bio_group objects. Each bio_group object will be created for each request queue where a bio from bio_cgroup is queued. Essentially the idea is that bios belonging to a cgroup can be on various request queues in the system. So a single object can not serve the purpose as it can not be on many rb-trees at the same time. Hence create one sub object which will keep track of bios belonging to one cgroup on a particular request queue. Each bio_group will contain a list of bios and this bio_group object will be a node in the rb-tree of request queue. For example. Lets say there are two request queues in the system q1 and q2 (lets say they belong to /dev/sda and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both for /dev/sda and /dev/sdb. bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of bios issued by task t1 for /dev/sda and bio_group2 will contain a list of bios issued by task t1 for /dev/sdb. I thought the same can be extended for stacked devices also. I am still trying to implementing it and hopefully this is doable idea. I think at the end of the day it will be something very close to dm-ioband algorithm just that there will be no lvm driver and no notion of separate dm-ioband device. > Are you going to block request descriptors in the tree? > >From the view point of performance, all the request descriptors > should be passed to the I/O schedulers, since the maximum number > of request descriptors is limited. > In my initial implementation I was queuing the request descriptors. Then you mentioned that it is not a good idea because potentially a cgroup issuing more requests might win the race. Yesterday night I thought, then why not start queuing the bios as they are submitted to the request_queue, using __make_request() and then release these to underlying elevator or underlying request queue (in case of stacked device). This will remove few issues. - All the layers can uniformly queue bios and no intermixing of queuing bios and request descriptors. - Will get rid of issue of one cgroup winning the race because of limited number of request descriptors. > And I still don't understand if you want to make your rb-tree > work efficiently, you need to put a lot of bios or request descriptors > into the tree. Is that what you are going to do? > On the other hand, dm-ioband tries to minimize to have bios blocked. > And I have a plan on reducing the maximum number that can be > blocked there. > Now I am planning to queue bios and probably there is no need to queue request descriptors. I think that's what dm-ioband is doing. Queueing bios for cgroups per io-band device. Thinking more about it, In dm-ioband case, you seem to be buffering bios from various cgroups on a separate request queue belonging to dm-ioband device. I was thinking of moving all that buffering logic to existing request queues instead of creating another request queue on top of request queue I want to control (dm-ioband device). > Sorry to bother you that I just don't understand the concept clearly. > > > So basically mechanism of maintaining an rb-tree can be completely > > ignorant of the fact whether a driver is keeping track of bios or keeping > > track of requests per cgroup. > > I don't care whether the queue is implemented as a rb-tee or some > kind of list because they are logically the same thing. That's true. rb-tree or list is just data structure detail. It is not important. The core thing I am trying to achive is that is there a way that I can get rid of notion of creating a separate dm-ioband device for every device I want to control. Is it just me who finds creation of dm-ioband devices odd and difficult to manage or there are other people who think that it would be nice if we can get rid of it? Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 8:29 ` Hirokazu Takahashi (?) (?) @ 2008-09-24 14:03 ` Vivek Goyal -1 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-24 14:03 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir On Wed, Sep 24, 2008 at 05:29:37PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > > > > such as seek time. > > > > > > > > > > > > > > You can see the details of the benchmarks at: > > > > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > > > > > (snip) > > > > > > > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > > > > every device we want to control using rules. This looks little odd > > > > > > atleast to me. Can't we keep it in line with rest of the controllers > > > > > > where task grouping takes place using cgroup and rules are specified in > > > > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > > > > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > > > > I've been also considering that this algorithm itself can be implemented > > > > > in the block layer directly. > > > > > > > > > > Although, the current implementation has merits. It is flexible. > > > > > - Dm-ioband can be place anywhere you like, which may be right before > > > > > the I/O schedulers or may be placed on top of LVM devices. > > > > > > > > Hi, > > > > > > > > An rb-tree per request queue also should be able to give us this > > > > flexibility. Because logic is implemented per request queue, rules can be > > > > placed at any layer. Either at bottom most layer where requests are > > > > passed to elevator or at higher layer where requests will be passed to > > > > lower level block devices in the stack. Just that we shall have to do > > > > modifications to some of the higher level dm/md drivers to make use of > > > > queuing cgroup requests and releasing cgroup requests to lower layers. > > > > > > Request descriptors are allocated just right before passing I/O requests > > > to the elevators. Even if you move the descriptor allocation point > > > before calling the dm/md drivers, the drivers can't make use of them. > > > > > > > You are right. request descriptors are currently allocated at bottom > > most layer. Anyway, in the rb-tree, we put bio cgroups as logical elements > > and every bio cgroup then contains the list of either bios or requeust > > descriptors. So what kind of list bio-cgroup maintains can depend on > > whether it is a higher layer driver (will maintain bios) or a lower layer > > driver (will maintain list of request descriptors per bio-cgroup). > > I'm getting confused about your idea. > > I thought you wanted to make each cgroup have its own rb-tree, > and wanted to make all the layers share the same rb-tree. > If so, are you going to put different things into the same tree? > Do you even want all the I/O schedlers use the same tree? > Ok, I will give more details of the thought process. I was thinking of maintaing an rb-tree per request queue and not an rb-tree per cgroup. This tree can contain all the bios submitted to that request queue through __make_request(). Every node in the tree will represent one cgroup and will contain a list of bios issued from the tasks from that cgroup. Every bio entering the request queue through __make_request() function first will be queued in one of the nodes in this rb-tree, depending on which cgroup that bio belongs to. Once the bios are buffered in rb-tree, we release these to underlying elevator depending on the proportionate weight of the nodes/cgroups. Some more details which I was trying to implement yesterday. There will be one bio_cgroup object per cgroup. This object will contain many bio_group objects. Each bio_group object will be created for each request queue where a bio from bio_cgroup is queued. Essentially the idea is that bios belonging to a cgroup can be on various request queues in the system. So a single object can not serve the purpose as it can not be on many rb-trees at the same time. Hence create one sub object which will keep track of bios belonging to one cgroup on a particular request queue. Each bio_group will contain a list of bios and this bio_group object will be a node in the rb-tree of request queue. For example. Lets say there are two request queues in the system q1 and q2 (lets say they belong to /dev/sda and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both for /dev/sda and /dev/sdb. bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of bios issued by task t1 for /dev/sda and bio_group2 will contain a list of bios issued by task t1 for /dev/sdb. I thought the same can be extended for stacked devices also. I am still trying to implementing it and hopefully this is doable idea. I think at the end of the day it will be something very close to dm-ioband algorithm just that there will be no lvm driver and no notion of separate dm-ioband device. > Are you going to block request descriptors in the tree? > >From the view point of performance, all the request descriptors > should be passed to the I/O schedulers, since the maximum number > of request descriptors is limited. > In my initial implementation I was queuing the request descriptors. Then you mentioned that it is not a good idea because potentially a cgroup issuing more requests might win the race. Yesterday night I thought, then why not start queuing the bios as they are submitted to the request_queue, using __make_request() and then release these to underlying elevator or underlying request queue (in case of stacked device). This will remove few issues. - All the layers can uniformly queue bios and no intermixing of queuing bios and request descriptors. - Will get rid of issue of one cgroup winning the race because of limited number of request descriptors. > And I still don't understand if you want to make your rb-tree > work efficiently, you need to put a lot of bios or request descriptors > into the tree. Is that what you are going to do? > On the other hand, dm-ioband tries to minimize to have bios blocked. > And I have a plan on reducing the maximum number that can be > blocked there. > Now I am planning to queue bios and probably there is no need to queue request descriptors. I think that's what dm-ioband is doing. Queueing bios for cgroups per io-band device. Thinking more about it, In dm-ioband case, you seem to be buffering bios from various cgroups on a separate request queue belonging to dm-ioband device. I was thinking of moving all that buffering logic to existing request queues instead of creating another request queue on top of request queue I want to control (dm-ioband device). > Sorry to bother you that I just don't understand the concept clearly. > > > So basically mechanism of maintaining an rb-tree can be completely > > ignorant of the fact whether a driver is keeping track of bios or keeping > > track of requests per cgroup. > > I don't care whether the queue is implemented as a rb-tee or some > kind of list because they are logically the same thing. That's true. rb-tree or list is just data structure detail. It is not important. The core thing I am trying to achive is that is there a way that I can get rid of notion of creating a separate dm-ioband device for every device I want to control. Is it just me who finds creation of dm-ioband devices odd and difficult to manage or there are other people who think that it would be nice if we can get rid of it? Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 8:29 ` Hirokazu Takahashi @ 2008-09-24 14:03 ` Vivek Goyal -1 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-24 14:03 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir On Wed, Sep 24, 2008 at 05:29:37PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > > > > such as seek time. > > > > > > > > > > > > > > You can see the details of the benchmarks at: > > > > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > > > > > (snip) > > > > > > > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > > > > every device we want to control using rules. This looks little odd > > > > > > atleast to me. Can't we keep it in line with rest of the controllers > > > > > > where task grouping takes place using cgroup and rules are specified in > > > > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > > > > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > > > > I've been also considering that this algorithm itself can be implemented > > > > > in the block layer directly. > > > > > > > > > > Although, the current implementation has merits. It is flexible. > > > > > - Dm-ioband can be place anywhere you like, which may be right before > > > > > the I/O schedulers or may be placed on top of LVM devices. > > > > > > > > Hi, > > > > > > > > An rb-tree per request queue also should be able to give us this > > > > flexibility. Because logic is implemented per request queue, rules can be > > > > placed at any layer. Either at bottom most layer where requests are > > > > passed to elevator or at higher layer where requests will be passed to > > > > lower level block devices in the stack. Just that we shall have to do > > > > modifications to some of the higher level dm/md drivers to make use of > > > > queuing cgroup requests and releasing cgroup requests to lower layers. > > > > > > Request descriptors are allocated just right before passing I/O requests > > > to the elevators. Even if you move the descriptor allocation point > > > before calling the dm/md drivers, the drivers can't make use of them. > > > > > > > You are right. request descriptors are currently allocated at bottom > > most layer. Anyway, in the rb-tree, we put bio cgroups as logical elements > > and every bio cgroup then contains the list of either bios or requeust > > descriptors. So what kind of list bio-cgroup maintains can depend on > > whether it is a higher layer driver (will maintain bios) or a lower layer > > driver (will maintain list of request descriptors per bio-cgroup). > > I'm getting confused about your idea. > > I thought you wanted to make each cgroup have its own rb-tree, > and wanted to make all the layers share the same rb-tree. > If so, are you going to put different things into the same tree? > Do you even want all the I/O schedlers use the same tree? > Ok, I will give more details of the thought process. I was thinking of maintaing an rb-tree per request queue and not an rb-tree per cgroup. This tree can contain all the bios submitted to that request queue through __make_request(). Every node in the tree will represent one cgroup and will contain a list of bios issued from the tasks from that cgroup. Every bio entering the request queue through __make_request() function first will be queued in one of the nodes in this rb-tree, depending on which cgroup that bio belongs to. Once the bios are buffered in rb-tree, we release these to underlying elevator depending on the proportionate weight of the nodes/cgroups. Some more details which I was trying to implement yesterday. There will be one bio_cgroup object per cgroup. This object will contain many bio_group objects. Each bio_group object will be created for each request queue where a bio from bio_cgroup is queued. Essentially the idea is that bios belonging to a cgroup can be on various request queues in the system. So a single object can not serve the purpose as it can not be on many rb-trees at the same time. Hence create one sub object which will keep track of bios belonging to one cgroup on a particular request queue. Each bio_group will contain a list of bios and this bio_group object will be a node in the rb-tree of request queue. For example. Lets say there are two request queues in the system q1 and q2 (lets say they belong to /dev/sda and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both for /dev/sda and /dev/sdb. bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of bios issued by task t1 for /dev/sda and bio_group2 will contain a list of bios issued by task t1 for /dev/sdb. I thought the same can be extended for stacked devices also. I am still trying to implementing it and hopefully this is doable idea. I think at the end of the day it will be something very close to dm-ioband algorithm just that there will be no lvm driver and no notion of separate dm-ioband device. > Are you going to block request descriptors in the tree? > >From the view point of performance, all the request descriptors > should be passed to the I/O schedulers, since the maximum number > of request descriptors is limited. > In my initial implementation I was queuing the request descriptors. Then you mentioned that it is not a good idea because potentially a cgroup issuing more requests might win the race. Yesterday night I thought, then why not start queuing the bios as they are submitted to the request_queue, using __make_request() and then release these to underlying elevator or underlying request queue (in case of stacked device). This will remove few issues. - All the layers can uniformly queue bios and no intermixing of queuing bios and request descriptors. - Will get rid of issue of one cgroup winning the race because of limited number of request descriptors. > And I still don't understand if you want to make your rb-tree > work efficiently, you need to put a lot of bios or request descriptors > into the tree. Is that what you are going to do? > On the other hand, dm-ioband tries to minimize to have bios blocked. > And I have a plan on reducing the maximum number that can be > blocked there. > Now I am planning to queue bios and probably there is no need to queue request descriptors. I think that's what dm-ioband is doing. Queueing bios for cgroups per io-band device. Thinking more about it, In dm-ioband case, you seem to be buffering bios from various cgroups on a separate request queue belonging to dm-ioband device. I was thinking of moving all that buffering logic to existing request queues instead of creating another request queue on top of request queue I want to control (dm-ioband device). > Sorry to bother you that I just don't understand the concept clearly. > > > So basically mechanism of maintaining an rb-tree can be completely > > ignorant of the fact whether a driver is keeping track of bios or keeping > > track of requests per cgroup. > > I don't care whether the queue is implemented as a rb-tee or some > kind of list because they are logically the same thing. That's true. rb-tree or list is just data structure detail. It is not important. The core thing I am trying to achive is that is there a way that I can get rid of notion of creating a separate dm-ioband device for every device I want to control. Is it just me who finds creation of dm-ioband devices odd and difficult to manage or there are other people who think that it would be nice if we can get rid of it? Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-24 14:03 ` Vivek Goyal 0 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-24 14:03 UTC (permalink / raw) To: Hirokazu Takahashi Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe On Wed, Sep 24, 2008 at 05:29:37PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > > > > such as seek time. > > > > > > > > > > > > > > You can see the details of the benchmarks at: > > > > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > > > > > (snip) > > > > > > > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > > > > every device we want to control using rules. This looks little odd > > > > > > atleast to me. Can't we keep it in line with rest of the controllers > > > > > > where task grouping takes place using cgroup and rules are specified in > > > > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > > > > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > > > > I've been also considering that this algorithm itself can be implemented > > > > > in the block layer directly. > > > > > > > > > > Although, the current implementation has merits. It is flexible. > > > > > - Dm-ioband can be place anywhere you like, which may be right before > > > > > the I/O schedulers or may be placed on top of LVM devices. > > > > > > > > Hi, > > > > > > > > An rb-tree per request queue also should be able to give us this > > > > flexibility. Because logic is implemented per request queue, rules can be > > > > placed at any layer. Either at bottom most layer where requests are > > > > passed to elevator or at higher layer where requests will be passed to > > > > lower level block devices in the stack. Just that we shall have to do > > > > modifications to some of the higher level dm/md drivers to make use of > > > > queuing cgroup requests and releasing cgroup requests to lower layers. > > > > > > Request descriptors are allocated just right before passing I/O requests > > > to the elevators. Even if you move the descriptor allocation point > > > before calling the dm/md drivers, the drivers can't make use of them. > > > > > > > You are right. request descriptors are currently allocated at bottom > > most layer. Anyway, in the rb-tree, we put bio cgroups as logical elements > > and every bio cgroup then contains the list of either bios or requeust > > descriptors. So what kind of list bio-cgroup maintains can depend on > > whether it is a higher layer driver (will maintain bios) or a lower layer > > driver (will maintain list of request descriptors per bio-cgroup). > > I'm getting confused about your idea. > > I thought you wanted to make each cgroup have its own rb-tree, > and wanted to make all the layers share the same rb-tree. > If so, are you going to put different things into the same tree? > Do you even want all the I/O schedlers use the same tree? > Ok, I will give more details of the thought process. I was thinking of maintaing an rb-tree per request queue and not an rb-tree per cgroup. This tree can contain all the bios submitted to that request queue through __make_request(). Every node in the tree will represent one cgroup and will contain a list of bios issued from the tasks from that cgroup. Every bio entering the request queue through __make_request() function first will be queued in one of the nodes in this rb-tree, depending on which cgroup that bio belongs to. Once the bios are buffered in rb-tree, we release these to underlying elevator depending on the proportionate weight of the nodes/cgroups. Some more details which I was trying to implement yesterday. There will be one bio_cgroup object per cgroup. This object will contain many bio_group objects. Each bio_group object will be created for each request queue where a bio from bio_cgroup is queued. Essentially the idea is that bios belonging to a cgroup can be on various request queues in the system. So a single object can not serve the purpose as it can not be on many rb-trees at the same time. Hence create one sub object which will keep track of bios belonging to one cgroup on a particular request queue. Each bio_group will contain a list of bios and this bio_group object will be a node in the rb-tree of request queue. For example. Lets say there are two request queues in the system q1 and q2 (lets say they belong to /dev/sda and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both for /dev/sda and /dev/sdb. bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of bios issued by task t1 for /dev/sda and bio_group2 will contain a list of bios issued by task t1 for /dev/sdb. I thought the same can be extended for stacked devices also. I am still trying to implementing it and hopefully this is doable idea. I think at the end of the day it will be something very close to dm-ioband algorithm just that there will be no lvm driver and no notion of separate dm-ioband device. > Are you going to block request descriptors in the tree? > >From the view point of performance, all the request descriptors > should be passed to the I/O schedulers, since the maximum number > of request descriptors is limited. > In my initial implementation I was queuing the request descriptors. Then you mentioned that it is not a good idea because potentially a cgroup issuing more requests might win the race. Yesterday night I thought, then why not start queuing the bios as they are submitted to the request_queue, using __make_request() and then release these to underlying elevator or underlying request queue (in case of stacked device). This will remove few issues. - All the layers can uniformly queue bios and no intermixing of queuing bios and request descriptors. - Will get rid of issue of one cgroup winning the race because of limited number of request descriptors. > And I still don't understand if you want to make your rb-tree > work efficiently, you need to put a lot of bios or request descriptors > into the tree. Is that what you are going to do? > On the other hand, dm-ioband tries to minimize to have bios blocked. > And I have a plan on reducing the maximum number that can be > blocked there. > Now I am planning to queue bios and probably there is no need to queue request descriptors. I think that's what dm-ioband is doing. Queueing bios for cgroups per io-band device. Thinking more about it, In dm-ioband case, you seem to be buffering bios from various cgroups on a separate request queue belonging to dm-ioband device. I was thinking of moving all that buffering logic to existing request queues instead of creating another request queue on top of request queue I want to control (dm-ioband device). > Sorry to bother you that I just don't understand the concept clearly. > > > So basically mechanism of maintaining an rb-tree can be completely > > ignorant of the fact whether a driver is keeping track of bios or keeping > > track of requests per cgroup. > > I don't care whether the queue is implemented as a rb-tee or some > kind of list because they are logically the same thing. That's true. rb-tree or list is just data structure detail. It is not important. The core thing I am trying to achive is that is there a way that I can get rid of notion of creating a separate dm-ioband device for every device I want to control. Is it just me who finds creation of dm-ioband devices odd and difficult to manage or there are other people who think that it would be nice if we can get rid of it? Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080924140355.GB547-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080924140355.GB547-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-09-26 16:11 ` Andrea Righi 0 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 16:11 UTC (permalink / raw) To: Vivek Goyal Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Vivek Goyal wrote: [snip] > Ok, I will give more details of the thought process. > > I was thinking of maintaing an rb-tree per request queue and not an > rb-tree per cgroup. This tree can contain all the bios submitted to that > request queue through __make_request(). Every node in the tree will represent > one cgroup and will contain a list of bios issued from the tasks from that > cgroup. > > Every bio entering the request queue through __make_request() function > first will be queued in one of the nodes in this rb-tree, depending on which > cgroup that bio belongs to. > > Once the bios are buffered in rb-tree, we release these to underlying > elevator depending on the proportionate weight of the nodes/cgroups. > > Some more details which I was trying to implement yesterday. > > There will be one bio_cgroup object per cgroup. This object will contain > many bio_group objects. Each bio_group object will be created for each > request queue where a bio from bio_cgroup is queued. Essentially the idea > is that bios belonging to a cgroup can be on various request queues in the > system. So a single object can not serve the purpose as it can not be on > many rb-trees at the same time. Hence create one sub object which will keep > track of bios belonging to one cgroup on a particular request queue. > > Each bio_group will contain a list of bios and this bio_group object will > be a node in the rb-tree of request queue. For example. Lets say there are > two request queues in the system q1 and q2 (lets say they belong to /dev/sda > and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both > for /dev/sda and /dev/sdb. > > bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group > objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree > and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of > bios issued by task t1 for /dev/sda and bio_group2 will contain a list of > bios issued by task t1 for /dev/sdb. I thought the same can be extended > for stacked devices also. > > I am still trying to implementing it and hopefully this is doable idea. > I think at the end of the day it will be something very close to dm-ioband > algorithm just that there will be no lvm driver and no notion of separate > dm-ioband device. Vivek, thanks for the detailed explanation. Only a comment. I guess, if we don't change also the per-process optimizations/improvements made by some IO scheduler, I think we can have undesirable behaviours. For example: CFQ uses the per-process iocontext to improve fairness between *all* the processes in a system. But it doesn't have the concept that there's a cgroup context on-top-of the processes. So, some optimizations made to guarantee fairness among processes could conflict with algorithms implemented at the cgroup layer. And potentially lead to undesirable behaviours. For example an issue I'm experiencing with my cgroup-io-throttle patchset is that a cgroup can consistently increase the IO rate (always respecting the max limits), simply increasing the number of IO worker tasks respect to another cgroup with a lower number of IO workers. This is probably due to the fact the CFQ tries to give the same amount of "IO time" to all the tasks, without considering that they're organized in cgroup. I don't see this behaviour with noop or deadline, because they don't have the concept of iocontext. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 14:03 ` Vivek Goyal (?) (?) @ 2008-09-26 16:11 ` Andrea Righi [not found] ` <48DD09AD.2010200-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> ` (2 more replies) -1 siblings, 3 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 16:11 UTC (permalink / raw) To: Vivek Goyal Cc: Hirokazu Takahashi, ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe Vivek Goyal wrote: [snip] > Ok, I will give more details of the thought process. > > I was thinking of maintaing an rb-tree per request queue and not an > rb-tree per cgroup. This tree can contain all the bios submitted to that > request queue through __make_request(). Every node in the tree will represent > one cgroup and will contain a list of bios issued from the tasks from that > cgroup. > > Every bio entering the request queue through __make_request() function > first will be queued in one of the nodes in this rb-tree, depending on which > cgroup that bio belongs to. > > Once the bios are buffered in rb-tree, we release these to underlying > elevator depending on the proportionate weight of the nodes/cgroups. > > Some more details which I was trying to implement yesterday. > > There will be one bio_cgroup object per cgroup. This object will contain > many bio_group objects. Each bio_group object will be created for each > request queue where a bio from bio_cgroup is queued. Essentially the idea > is that bios belonging to a cgroup can be on various request queues in the > system. So a single object can not serve the purpose as it can not be on > many rb-trees at the same time. Hence create one sub object which will keep > track of bios belonging to one cgroup on a particular request queue. > > Each bio_group will contain a list of bios and this bio_group object will > be a node in the rb-tree of request queue. For example. Lets say there are > two request queues in the system q1 and q2 (lets say they belong to /dev/sda > and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both > for /dev/sda and /dev/sdb. > > bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group > objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree > and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of > bios issued by task t1 for /dev/sda and bio_group2 will contain a list of > bios issued by task t1 for /dev/sdb. I thought the same can be extended > for stacked devices also. > > I am still trying to implementing it and hopefully this is doable idea. > I think at the end of the day it will be something very close to dm-ioband > algorithm just that there will be no lvm driver and no notion of separate > dm-ioband device. Vivek, thanks for the detailed explanation. Only a comment. I guess, if we don't change also the per-process optimizations/improvements made by some IO scheduler, I think we can have undesirable behaviours. For example: CFQ uses the per-process iocontext to improve fairness between *all* the processes in a system. But it doesn't have the concept that there's a cgroup context on-top-of the processes. So, some optimizations made to guarantee fairness among processes could conflict with algorithms implemented at the cgroup layer. And potentially lead to undesirable behaviours. For example an issue I'm experiencing with my cgroup-io-throttle patchset is that a cgroup can consistently increase the IO rate (always respecting the max limits), simply increasing the number of IO worker tasks respect to another cgroup with a lower number of IO workers. This is probably due to the fact the CFQ tries to give the same amount of "IO time" to all the tasks, without considering that they're organized in cgroup. I don't see this behaviour with noop or deadline, because they don't have the concept of iocontext. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <48DD09AD.2010200-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <48DD09AD.2010200-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2008-09-26 17:11 ` Andrea Righi 0 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 17:11 UTC (permalink / raw) To: Vivek Goyal Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Andrea Righi wrote: > Vivek Goyal wrote: > [snip] >> Ok, I will give more details of the thought process. >> >> I was thinking of maintaing an rb-tree per request queue and not an >> rb-tree per cgroup. This tree can contain all the bios submitted to that >> request queue through __make_request(). Every node in the tree will represent >> one cgroup and will contain a list of bios issued from the tasks from that >> cgroup. >> >> Every bio entering the request queue through __make_request() function >> first will be queued in one of the nodes in this rb-tree, depending on which >> cgroup that bio belongs to. >> >> Once the bios are buffered in rb-tree, we release these to underlying >> elevator depending on the proportionate weight of the nodes/cgroups. >> >> Some more details which I was trying to implement yesterday. >> >> There will be one bio_cgroup object per cgroup. This object will contain >> many bio_group objects. Each bio_group object will be created for each >> request queue where a bio from bio_cgroup is queued. Essentially the idea >> is that bios belonging to a cgroup can be on various request queues in the >> system. So a single object can not serve the purpose as it can not be on >> many rb-trees at the same time. Hence create one sub object which will keep >> track of bios belonging to one cgroup on a particular request queue. >> >> Each bio_group will contain a list of bios and this bio_group object will >> be a node in the rb-tree of request queue. For example. Lets say there are >> two request queues in the system q1 and q2 (lets say they belong to /dev/sda >> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both >> for /dev/sda and /dev/sdb. >> >> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group >> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree >> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of >> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of >> bios issued by task t1 for /dev/sdb. I thought the same can be extended >> for stacked devices also. >> >> I am still trying to implementing it and hopefully this is doable idea. >> I think at the end of the day it will be something very close to dm-ioband >> algorithm just that there will be no lvm driver and no notion of separate >> dm-ioband device. > > Vivek, thanks for the detailed explanation. Only a comment. I guess, if > we don't change also the per-process optimizations/improvements made by > some IO scheduler, I think we can have undesirable behaviours. > > For example: CFQ uses the per-process iocontext to improve fairness > between *all* the processes in a system. But it doesn't have the concept > that there's a cgroup context on-top-of the processes. > > So, some optimizations made to guarantee fairness among processes could > conflict with algorithms implemented at the cgroup layer. And > potentially lead to undesirable behaviours. > > For example an issue I'm experiencing with my cgroup-io-throttle > patchset is that a cgroup can consistently increase the IO rate (always > respecting the max limits), simply increasing the number of IO worker > tasks respect to another cgroup with a lower number of IO workers. This > is probably due to the fact the CFQ tries to give the same amount of > "IO time" to all the tasks, without considering that they're organized > in cgroup. BTW this is why I proposed to use a single shared iocontext for all the processes running in the same cgroup. Anyway, this is not the best solution, because in this way all the IO requests coming from a cgroup will be queued to the same cfq queue. If I'm not wrong in this way we would implement noop (FIFO) between tasks belonging to the same cgroup and CFQ between cgroups. But, at least for this particular case, we would be able to provide fairness among cgroups. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-26 16:11 ` Andrea Righi [not found] ` <48DD09AD.2010200-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2008-09-26 17:11 ` Andrea Righi 2008-09-26 17:11 ` Andrea Righi 2 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 17:11 UTC (permalink / raw) To: Vivek Goyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, Hirokazu Takahashi, dm-devel, agk, xemul, fernando, balbir Andrea Righi wrote: > Vivek Goyal wrote: > [snip] >> Ok, I will give more details of the thought process. >> >> I was thinking of maintaing an rb-tree per request queue and not an >> rb-tree per cgroup. This tree can contain all the bios submitted to that >> request queue through __make_request(). Every node in the tree will represent >> one cgroup and will contain a list of bios issued from the tasks from that >> cgroup. >> >> Every bio entering the request queue through __make_request() function >> first will be queued in one of the nodes in this rb-tree, depending on which >> cgroup that bio belongs to. >> >> Once the bios are buffered in rb-tree, we release these to underlying >> elevator depending on the proportionate weight of the nodes/cgroups. >> >> Some more details which I was trying to implement yesterday. >> >> There will be one bio_cgroup object per cgroup. This object will contain >> many bio_group objects. Each bio_group object will be created for each >> request queue where a bio from bio_cgroup is queued. Essentially the idea >> is that bios belonging to a cgroup can be on various request queues in the >> system. So a single object can not serve the purpose as it can not be on >> many rb-trees at the same time. Hence create one sub object which will keep >> track of bios belonging to one cgroup on a particular request queue. >> >> Each bio_group will contain a list of bios and this bio_group object will >> be a node in the rb-tree of request queue. For example. Lets say there are >> two request queues in the system q1 and q2 (lets say they belong to /dev/sda >> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both >> for /dev/sda and /dev/sdb. >> >> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group >> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree >> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of >> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of >> bios issued by task t1 for /dev/sdb. I thought the same can be extended >> for stacked devices also. >> >> I am still trying to implementing it and hopefully this is doable idea. >> I think at the end of the day it will be something very close to dm-ioband >> algorithm just that there will be no lvm driver and no notion of separate >> dm-ioband device. > > Vivek, thanks for the detailed explanation. Only a comment. I guess, if > we don't change also the per-process optimizations/improvements made by > some IO scheduler, I think we can have undesirable behaviours. > > For example: CFQ uses the per-process iocontext to improve fairness > between *all* the processes in a system. But it doesn't have the concept > that there's a cgroup context on-top-of the processes. > > So, some optimizations made to guarantee fairness among processes could > conflict with algorithms implemented at the cgroup layer. And > potentially lead to undesirable behaviours. > > For example an issue I'm experiencing with my cgroup-io-throttle > patchset is that a cgroup can consistently increase the IO rate (always > respecting the max limits), simply increasing the number of IO worker > tasks respect to another cgroup with a lower number of IO workers. This > is probably due to the fact the CFQ tries to give the same amount of > "IO time" to all the tasks, without considering that they're organized > in cgroup. BTW this is why I proposed to use a single shared iocontext for all the processes running in the same cgroup. Anyway, this is not the best solution, because in this way all the IO requests coming from a cgroup will be queued to the same cfq queue. If I'm not wrong in this way we would implement noop (FIFO) between tasks belonging to the same cgroup and CFQ between cgroups. But, at least for this particular case, we would be able to provide fairness among cgroups. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-26 16:11 ` Andrea Righi [not found] ` <48DD09AD.2010200-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2008-09-26 17:11 ` Andrea Righi @ 2008-09-26 17:11 ` Andrea Righi [not found] ` <48DD17A9.9080607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> ` (4 more replies) 2 siblings, 5 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 17:11 UTC (permalink / raw) To: Vivek Goyal Cc: Hirokazu Takahashi, ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe Andrea Righi wrote: > Vivek Goyal wrote: > [snip] >> Ok, I will give more details of the thought process. >> >> I was thinking of maintaing an rb-tree per request queue and not an >> rb-tree per cgroup. This tree can contain all the bios submitted to that >> request queue through __make_request(). Every node in the tree will represent >> one cgroup and will contain a list of bios issued from the tasks from that >> cgroup. >> >> Every bio entering the request queue through __make_request() function >> first will be queued in one of the nodes in this rb-tree, depending on which >> cgroup that bio belongs to. >> >> Once the bios are buffered in rb-tree, we release these to underlying >> elevator depending on the proportionate weight of the nodes/cgroups. >> >> Some more details which I was trying to implement yesterday. >> >> There will be one bio_cgroup object per cgroup. This object will contain >> many bio_group objects. Each bio_group object will be created for each >> request queue where a bio from bio_cgroup is queued. Essentially the idea >> is that bios belonging to a cgroup can be on various request queues in the >> system. So a single object can not serve the purpose as it can not be on >> many rb-trees at the same time. Hence create one sub object which will keep >> track of bios belonging to one cgroup on a particular request queue. >> >> Each bio_group will contain a list of bios and this bio_group object will >> be a node in the rb-tree of request queue. For example. Lets say there are >> two request queues in the system q1 and q2 (lets say they belong to /dev/sda >> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both >> for /dev/sda and /dev/sdb. >> >> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group >> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree >> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of >> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of >> bios issued by task t1 for /dev/sdb. I thought the same can be extended >> for stacked devices also. >> >> I am still trying to implementing it and hopefully this is doable idea. >> I think at the end of the day it will be something very close to dm-ioband >> algorithm just that there will be no lvm driver and no notion of separate >> dm-ioband device. > > Vivek, thanks for the detailed explanation. Only a comment. I guess, if > we don't change also the per-process optimizations/improvements made by > some IO scheduler, I think we can have undesirable behaviours. > > For example: CFQ uses the per-process iocontext to improve fairness > between *all* the processes in a system. But it doesn't have the concept > that there's a cgroup context on-top-of the processes. > > So, some optimizations made to guarantee fairness among processes could > conflict with algorithms implemented at the cgroup layer. And > potentially lead to undesirable behaviours. > > For example an issue I'm experiencing with my cgroup-io-throttle > patchset is that a cgroup can consistently increase the IO rate (always > respecting the max limits), simply increasing the number of IO worker > tasks respect to another cgroup with a lower number of IO workers. This > is probably due to the fact the CFQ tries to give the same amount of > "IO time" to all the tasks, without considering that they're organized > in cgroup. BTW this is why I proposed to use a single shared iocontext for all the processes running in the same cgroup. Anyway, this is not the best solution, because in this way all the IO requests coming from a cgroup will be queued to the same cfq queue. If I'm not wrong in this way we would implement noop (FIFO) between tasks belonging to the same cgroup and CFQ between cgroups. But, at least for this particular case, we would be able to provide fairness among cgroups. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <48DD17A9.9080607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <48DD17A9.9080607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2008-09-26 17:30 ` Andrea Righi 2008-09-29 12:07 ` Hirokazu Takahashi 1 sibling, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 17:30 UTC (permalink / raw) To: Vivek Goyal Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Andrea Righi wrote: > Andrea Righi wrote: >> Vivek Goyal wrote: >> [snip] >>> Ok, I will give more details of the thought process. >>> >>> I was thinking of maintaing an rb-tree per request queue and not an >>> rb-tree per cgroup. This tree can contain all the bios submitted to that >>> request queue through __make_request(). Every node in the tree will represent >>> one cgroup and will contain a list of bios issued from the tasks from that >>> cgroup. >>> >>> Every bio entering the request queue through __make_request() function >>> first will be queued in one of the nodes in this rb-tree, depending on which >>> cgroup that bio belongs to. >>> >>> Once the bios are buffered in rb-tree, we release these to underlying >>> elevator depending on the proportionate weight of the nodes/cgroups. >>> >>> Some more details which I was trying to implement yesterday. >>> >>> There will be one bio_cgroup object per cgroup. This object will contain >>> many bio_group objects. Each bio_group object will be created for each >>> request queue where a bio from bio_cgroup is queued. Essentially the idea >>> is that bios belonging to a cgroup can be on various request queues in the >>> system. So a single object can not serve the purpose as it can not be on >>> many rb-trees at the same time. Hence create one sub object which will keep >>> track of bios belonging to one cgroup on a particular request queue. >>> >>> Each bio_group will contain a list of bios and this bio_group object will >>> be a node in the rb-tree of request queue. For example. Lets say there are >>> two request queues in the system q1 and q2 (lets say they belong to /dev/sda >>> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both >>> for /dev/sda and /dev/sdb. >>> >>> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group >>> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree >>> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of >>> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of >>> bios issued by task t1 for /dev/sdb. I thought the same can be extended >>> for stacked devices also. >>> >>> I am still trying to implementing it and hopefully this is doable idea. >>> I think at the end of the day it will be something very close to dm-ioband >>> algorithm just that there will be no lvm driver and no notion of separate >>> dm-ioband device. >> Vivek, thanks for the detailed explanation. Only a comment. I guess, if >> we don't change also the per-process optimizations/improvements made by >> some IO scheduler, I think we can have undesirable behaviours. >> >> For example: CFQ uses the per-process iocontext to improve fairness >> between *all* the processes in a system. But it doesn't have the concept >> that there's a cgroup context on-top-of the processes. >> >> So, some optimizations made to guarantee fairness among processes could >> conflict with algorithms implemented at the cgroup layer. And >> potentially lead to undesirable behaviours. >> >> For example an issue I'm experiencing with my cgroup-io-throttle >> patchset is that a cgroup can consistently increase the IO rate (always >> respecting the max limits), simply increasing the number of IO worker >> tasks respect to another cgroup with a lower number of IO workers. This >> is probably due to the fact the CFQ tries to give the same amount of >> "IO time" to all the tasks, without considering that they're organized >> in cgroup. > > BTW this is why I proposed to use a single shared iocontext for all the > processes running in the same cgroup. Anyway, this is not the best > solution, because in this way all the IO requests coming from a cgroup > will be queued to the same cfq queue. If I'm not wrong in this way we > would implement noop (FIFO) between tasks belonging to the same cgroup > and CFQ between cgroups. But, at least for this particular case, we > would be able to provide fairness among cgroups. Ah! also have a look at this: http://download.systemimager.org/~arighi/linux/patches/io-throttle/benchmark/graph/effect-of-per-process-cfq-fairness-on-the-cgroup-context.png The graph highlights the dependency between the IO rate and the number of tasks running in a cgroup. For this testcase I've used 2 cgroups: - cgroup A, with a single task doing IO (large O_DIRECT read stream) - cgroup B, with a variable number of tasks ranging from 1 to 16 doing IO in parallel If we want to be "fair" the gap of IO performance between the cgroups should be close to 0. Using "plain" cfq (red line) the gap of performance increases incrementing the number of tasks in a cgroup. Using cgroup-io-throttle on top of cfq (green line) the gap of performance is lower (the asymptotic curve is due to the bandwidth capping provided by cgroup-io-throttle). Using cgroup-io-throttle and a single shared iocontext for each cgroup (blue line) the gap of performance is really close to 0. Anyway, I repeat, I don't think this is a wonderful solution, it is just to highlights this issue and share with you the results of some tests I did. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <48DD17A9.9080607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2008-09-26 17:30 ` Andrea Righi @ 2008-09-29 12:07 ` Hirokazu Takahashi 1 sibling, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-29 12:07 UTC (permalink / raw) To: righi.andrea-Re5JQEeQqe8AvxtiuMwx3w Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi, Andrea, > >> Ok, I will give more details of the thought process. > >> > >> I was thinking of maintaing an rb-tree per request queue and not an > >> rb-tree per cgroup. This tree can contain all the bios submitted to that > >> request queue through __make_request(). Every node in the tree will represent > >> one cgroup and will contain a list of bios issued from the tasks from that > >> cgroup. > >> > >> Every bio entering the request queue through __make_request() function > >> first will be queued in one of the nodes in this rb-tree, depending on which > >> cgroup that bio belongs to. > >> > >> Once the bios are buffered in rb-tree, we release these to underlying > >> elevator depending on the proportionate weight of the nodes/cgroups. > >> > >> Some more details which I was trying to implement yesterday. > >> > >> There will be one bio_cgroup object per cgroup. This object will contain > >> many bio_group objects. Each bio_group object will be created for each > >> request queue where a bio from bio_cgroup is queued. Essentially the idea > >> is that bios belonging to a cgroup can be on various request queues in the > >> system. So a single object can not serve the purpose as it can not be on > >> many rb-trees at the same time. Hence create one sub object which will keep > >> track of bios belonging to one cgroup on a particular request queue. > >> > >> Each bio_group will contain a list of bios and this bio_group object will > >> be a node in the rb-tree of request queue. For example. Lets say there are > >> two request queues in the system q1 and q2 (lets say they belong to /dev/sda > >> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both > >> for /dev/sda and /dev/sdb. > >> > >> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group > >> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree > >> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of > >> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of > >> bios issued by task t1 for /dev/sdb. I thought the same can be extended > >> for stacked devices also. > >> > >> I am still trying to implementing it and hopefully this is doable idea. > >> I think at the end of the day it will be something very close to dm-ioband > >> algorithm just that there will be no lvm driver and no notion of separate > >> dm-ioband device. > > > > Vivek, thanks for the detailed explanation. Only a comment. I guess, if > > we don't change also the per-process optimizations/improvements made by > > some IO scheduler, I think we can have undesirable behaviours. > > > > For example: CFQ uses the per-process iocontext to improve fairness > > between *all* the processes in a system. But it doesn't have the concept > > that there's a cgroup context on-top-of the processes. > > > > So, some optimizations made to guarantee fairness among processes could > > conflict with algorithms implemented at the cgroup layer. And > > potentially lead to undesirable behaviours. > > > > For example an issue I'm experiencing with my cgroup-io-throttle > > patchset is that a cgroup can consistently increase the IO rate (always > > respecting the max limits), simply increasing the number of IO worker > > tasks respect to another cgroup with a lower number of IO workers. This > > is probably due to the fact the CFQ tries to give the same amount of > > "IO time" to all the tasks, without considering that they're organized > > in cgroup. > > BTW this is why I proposed to use a single shared iocontext for all the > processes running in the same cgroup. Anyway, this is not the best > solution, because in this way all the IO requests coming from a cgroup > will be queued to the same cfq queue. If I'm not wrong in this way we > would implement noop (FIFO) between tasks belonging to the same cgroup > and CFQ between cgroups. But, at least for this particular case, we > would be able to provide fairness among cgroups. > > -Andrea I ever thought the same thing but this approach breaks the compatibility. I think we should make ionice only effective for the processes in the same cgroup. A system gives some amount of bandwidths to its cgroups, and the processes in one of the cgroups fairly share the given bandwidth. I think this is the straight approach. What do you think? I think all the CFQ-cgroup the NEC guys are working, OpenVZ team's CFQ scheduler and dm-ioband with bio-cgroup work like this. Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-26 17:11 ` Andrea Righi [not found] ` <48DD17A9.9080607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2008-09-26 17:30 ` Andrea Righi 2008-09-26 17:30 ` Andrea Righi ` (2 subsequent siblings) 4 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 17:30 UTC (permalink / raw) To: Vivek Goyal Cc: Hirokazu Takahashi, ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe Andrea Righi wrote: > Andrea Righi wrote: >> Vivek Goyal wrote: >> [snip] >>> Ok, I will give more details of the thought process. >>> >>> I was thinking of maintaing an rb-tree per request queue and not an >>> rb-tree per cgroup. This tree can contain all the bios submitted to that >>> request queue through __make_request(). Every node in the tree will represent >>> one cgroup and will contain a list of bios issued from the tasks from that >>> cgroup. >>> >>> Every bio entering the request queue through __make_request() function >>> first will be queued in one of the nodes in this rb-tree, depending on which >>> cgroup that bio belongs to. >>> >>> Once the bios are buffered in rb-tree, we release these to underlying >>> elevator depending on the proportionate weight of the nodes/cgroups. >>> >>> Some more details which I was trying to implement yesterday. >>> >>> There will be one bio_cgroup object per cgroup. This object will contain >>> many bio_group objects. Each bio_group object will be created for each >>> request queue where a bio from bio_cgroup is queued. Essentially the idea >>> is that bios belonging to a cgroup can be on various request queues in the >>> system. So a single object can not serve the purpose as it can not be on >>> many rb-trees at the same time. Hence create one sub object which will keep >>> track of bios belonging to one cgroup on a particular request queue. >>> >>> Each bio_group will contain a list of bios and this bio_group object will >>> be a node in the rb-tree of request queue. For example. Lets say there are >>> two request queues in the system q1 and q2 (lets say they belong to /dev/sda >>> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both >>> for /dev/sda and /dev/sdb. >>> >>> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group >>> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree >>> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of >>> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of >>> bios issued by task t1 for /dev/sdb. I thought the same can be extended >>> for stacked devices also. >>> >>> I am still trying to implementing it and hopefully this is doable idea. >>> I think at the end of the day it will be something very close to dm-ioband >>> algorithm just that there will be no lvm driver and no notion of separate >>> dm-ioband device. >> Vivek, thanks for the detailed explanation. Only a comment. I guess, if >> we don't change also the per-process optimizations/improvements made by >> some IO scheduler, I think we can have undesirable behaviours. >> >> For example: CFQ uses the per-process iocontext to improve fairness >> between *all* the processes in a system. But it doesn't have the concept >> that there's a cgroup context on-top-of the processes. >> >> So, some optimizations made to guarantee fairness among processes could >> conflict with algorithms implemented at the cgroup layer. And >> potentially lead to undesirable behaviours. >> >> For example an issue I'm experiencing with my cgroup-io-throttle >> patchset is that a cgroup can consistently increase the IO rate (always >> respecting the max limits), simply increasing the number of IO worker >> tasks respect to another cgroup with a lower number of IO workers. This >> is probably due to the fact the CFQ tries to give the same amount of >> "IO time" to all the tasks, without considering that they're organized >> in cgroup. > > BTW this is why I proposed to use a single shared iocontext for all the > processes running in the same cgroup. Anyway, this is not the best > solution, because in this way all the IO requests coming from a cgroup > will be queued to the same cfq queue. If I'm not wrong in this way we > would implement noop (FIFO) between tasks belonging to the same cgroup > and CFQ between cgroups. But, at least for this particular case, we > would be able to provide fairness among cgroups. Ah! also have a look at this: http://download.systemimager.org/~arighi/linux/patches/io-throttle/benchmark/graph/effect-of-per-process-cfq-fairness-on-the-cgroup-context.png The graph highlights the dependency between the IO rate and the number of tasks running in a cgroup. For this testcase I've used 2 cgroups: - cgroup A, with a single task doing IO (large O_DIRECT read stream) - cgroup B, with a variable number of tasks ranging from 1 to 16 doing IO in parallel If we want to be "fair" the gap of IO performance between the cgroups should be close to 0. Using "plain" cfq (red line) the gap of performance increases incrementing the number of tasks in a cgroup. Using cgroup-io-throttle on top of cfq (green line) the gap of performance is lower (the asymptotic curve is due to the bandwidth capping provided by cgroup-io-throttle). Using cgroup-io-throttle and a single shared iocontext for each cgroup (blue line) the gap of performance is really close to 0. Anyway, I repeat, I don't think this is a wonderful solution, it is just to highlights this issue and share with you the results of some tests I did. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-26 17:11 ` Andrea Righi [not found] ` <48DD17A9.9080607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2008-09-26 17:30 ` Andrea Righi @ 2008-09-26 17:30 ` Andrea Righi 2008-09-29 12:07 ` Hirokazu Takahashi 2008-09-29 12:07 ` Hirokazu Takahashi 4 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 17:30 UTC (permalink / raw) To: Vivek Goyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, Hirokazu Takahashi, dm-devel, agk, xemul, fernando, balbir Andrea Righi wrote: > Andrea Righi wrote: >> Vivek Goyal wrote: >> [snip] >>> Ok, I will give more details of the thought process. >>> >>> I was thinking of maintaing an rb-tree per request queue and not an >>> rb-tree per cgroup. This tree can contain all the bios submitted to that >>> request queue through __make_request(). Every node in the tree will represent >>> one cgroup and will contain a list of bios issued from the tasks from that >>> cgroup. >>> >>> Every bio entering the request queue through __make_request() function >>> first will be queued in one of the nodes in this rb-tree, depending on which >>> cgroup that bio belongs to. >>> >>> Once the bios are buffered in rb-tree, we release these to underlying >>> elevator depending on the proportionate weight of the nodes/cgroups. >>> >>> Some more details which I was trying to implement yesterday. >>> >>> There will be one bio_cgroup object per cgroup. This object will contain >>> many bio_group objects. Each bio_group object will be created for each >>> request queue where a bio from bio_cgroup is queued. Essentially the idea >>> is that bios belonging to a cgroup can be on various request queues in the >>> system. So a single object can not serve the purpose as it can not be on >>> many rb-trees at the same time. Hence create one sub object which will keep >>> track of bios belonging to one cgroup on a particular request queue. >>> >>> Each bio_group will contain a list of bios and this bio_group object will >>> be a node in the rb-tree of request queue. For example. Lets say there are >>> two request queues in the system q1 and q2 (lets say they belong to /dev/sda >>> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both >>> for /dev/sda and /dev/sdb. >>> >>> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group >>> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree >>> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of >>> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of >>> bios issued by task t1 for /dev/sdb. I thought the same can be extended >>> for stacked devices also. >>> >>> I am still trying to implementing it and hopefully this is doable idea. >>> I think at the end of the day it will be something very close to dm-ioband >>> algorithm just that there will be no lvm driver and no notion of separate >>> dm-ioband device. >> Vivek, thanks for the detailed explanation. Only a comment. I guess, if >> we don't change also the per-process optimizations/improvements made by >> some IO scheduler, I think we can have undesirable behaviours. >> >> For example: CFQ uses the per-process iocontext to improve fairness >> between *all* the processes in a system. But it doesn't have the concept >> that there's a cgroup context on-top-of the processes. >> >> So, some optimizations made to guarantee fairness among processes could >> conflict with algorithms implemented at the cgroup layer. And >> potentially lead to undesirable behaviours. >> >> For example an issue I'm experiencing with my cgroup-io-throttle >> patchset is that a cgroup can consistently increase the IO rate (always >> respecting the max limits), simply increasing the number of IO worker >> tasks respect to another cgroup with a lower number of IO workers. This >> is probably due to the fact the CFQ tries to give the same amount of >> "IO time" to all the tasks, without considering that they're organized >> in cgroup. > > BTW this is why I proposed to use a single shared iocontext for all the > processes running in the same cgroup. Anyway, this is not the best > solution, because in this way all the IO requests coming from a cgroup > will be queued to the same cfq queue. If I'm not wrong in this way we > would implement noop (FIFO) between tasks belonging to the same cgroup > and CFQ between cgroups. But, at least for this particular case, we > would be able to provide fairness among cgroups. Ah! also have a look at this: http://download.systemimager.org/~arighi/linux/patches/io-throttle/benchmark/graph/effect-of-per-process-cfq-fairness-on-the-cgroup-context.png The graph highlights the dependency between the IO rate and the number of tasks running in a cgroup. For this testcase I've used 2 cgroups: - cgroup A, with a single task doing IO (large O_DIRECT read stream) - cgroup B, with a variable number of tasks ranging from 1 to 16 doing IO in parallel If we want to be "fair" the gap of IO performance between the cgroups should be close to 0. Using "plain" cfq (red line) the gap of performance increases incrementing the number of tasks in a cgroup. Using cgroup-io-throttle on top of cfq (green line) the gap of performance is lower (the asymptotic curve is due to the bandwidth capping provided by cgroup-io-throttle). Using cgroup-io-throttle and a single shared iocontext for each cgroup (blue line) the gap of performance is really close to 0. Anyway, I repeat, I don't think this is a wonderful solution, it is just to highlights this issue and share with you the results of some tests I did. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-26 17:11 ` Andrea Righi @ 2008-09-29 12:07 ` Hirokazu Takahashi 2008-09-26 17:30 ` Andrea Righi ` (3 subsequent siblings) 4 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-29 12:07 UTC (permalink / raw) To: righi.andrea Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, ryov, xemul, fernando, vgoyal, balbir Hi, Andrea, > >> Ok, I will give more details of the thought process. > >> > >> I was thinking of maintaing an rb-tree per request queue and not an > >> rb-tree per cgroup. This tree can contain all the bios submitted to that > >> request queue through __make_request(). Every node in the tree will represent > >> one cgroup and will contain a list of bios issued from the tasks from that > >> cgroup. > >> > >> Every bio entering the request queue through __make_request() function > >> first will be queued in one of the nodes in this rb-tree, depending on which > >> cgroup that bio belongs to. > >> > >> Once the bios are buffered in rb-tree, we release these to underlying > >> elevator depending on the proportionate weight of the nodes/cgroups. > >> > >> Some more details which I was trying to implement yesterday. > >> > >> There will be one bio_cgroup object per cgroup. This object will contain > >> many bio_group objects. Each bio_group object will be created for each > >> request queue where a bio from bio_cgroup is queued. Essentially the idea > >> is that bios belonging to a cgroup can be on various request queues in the > >> system. So a single object can not serve the purpose as it can not be on > >> many rb-trees at the same time. Hence create one sub object which will keep > >> track of bios belonging to one cgroup on a particular request queue. > >> > >> Each bio_group will contain a list of bios and this bio_group object will > >> be a node in the rb-tree of request queue. For example. Lets say there are > >> two request queues in the system q1 and q2 (lets say they belong to /dev/sda > >> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both > >> for /dev/sda and /dev/sdb. > >> > >> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group > >> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree > >> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of > >> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of > >> bios issued by task t1 for /dev/sdb. I thought the same can be extended > >> for stacked devices also. > >> > >> I am still trying to implementing it and hopefully this is doable idea. > >> I think at the end of the day it will be something very close to dm-ioband > >> algorithm just that there will be no lvm driver and no notion of separate > >> dm-ioband device. > > > > Vivek, thanks for the detailed explanation. Only a comment. I guess, if > > we don't change also the per-process optimizations/improvements made by > > some IO scheduler, I think we can have undesirable behaviours. > > > > For example: CFQ uses the per-process iocontext to improve fairness > > between *all* the processes in a system. But it doesn't have the concept > > that there's a cgroup context on-top-of the processes. > > > > So, some optimizations made to guarantee fairness among processes could > > conflict with algorithms implemented at the cgroup layer. And > > potentially lead to undesirable behaviours. > > > > For example an issue I'm experiencing with my cgroup-io-throttle > > patchset is that a cgroup can consistently increase the IO rate (always > > respecting the max limits), simply increasing the number of IO worker > > tasks respect to another cgroup with a lower number of IO workers. This > > is probably due to the fact the CFQ tries to give the same amount of > > "IO time" to all the tasks, without considering that they're organized > > in cgroup. > > BTW this is why I proposed to use a single shared iocontext for all the > processes running in the same cgroup. Anyway, this is not the best > solution, because in this way all the IO requests coming from a cgroup > will be queued to the same cfq queue. If I'm not wrong in this way we > would implement noop (FIFO) between tasks belonging to the same cgroup > and CFQ between cgroups. But, at least for this particular case, we > would be able to provide fairness among cgroups. > > -Andrea I ever thought the same thing but this approach breaks the compatibility. I think we should make ionice only effective for the processes in the same cgroup. A system gives some amount of bandwidths to its cgroups, and the processes in one of the cgroups fairly share the given bandwidth. I think this is the straight approach. What do you think? I think all the CFQ-cgroup the NEC guys are working, OpenVZ team's CFQ scheduler and dm-ioband with bio-cgroup work like this. Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-29 12:07 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-29 12:07 UTC (permalink / raw) To: righi.andrea Cc: vgoyal, ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe Hi, Andrea, > >> Ok, I will give more details of the thought process. > >> > >> I was thinking of maintaing an rb-tree per request queue and not an > >> rb-tree per cgroup. This tree can contain all the bios submitted to that > >> request queue through __make_request(). Every node in the tree will represent > >> one cgroup and will contain a list of bios issued from the tasks from that > >> cgroup. > >> > >> Every bio entering the request queue through __make_request() function > >> first will be queued in one of the nodes in this rb-tree, depending on which > >> cgroup that bio belongs to. > >> > >> Once the bios are buffered in rb-tree, we release these to underlying > >> elevator depending on the proportionate weight of the nodes/cgroups. > >> > >> Some more details which I was trying to implement yesterday. > >> > >> There will be one bio_cgroup object per cgroup. This object will contain > >> many bio_group objects. Each bio_group object will be created for each > >> request queue where a bio from bio_cgroup is queued. Essentially the idea > >> is that bios belonging to a cgroup can be on various request queues in the > >> system. So a single object can not serve the purpose as it can not be on > >> many rb-trees at the same time. Hence create one sub object which will keep > >> track of bios belonging to one cgroup on a particular request queue. > >> > >> Each bio_group will contain a list of bios and this bio_group object will > >> be a node in the rb-tree of request queue. For example. Lets say there are > >> two request queues in the system q1 and q2 (lets say they belong to /dev/sda > >> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both > >> for /dev/sda and /dev/sdb. > >> > >> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group > >> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree > >> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of > >> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of > >> bios issued by task t1 for /dev/sdb. I thought the same can be extended > >> for stacked devices also. > >> > >> I am still trying to implementing it and hopefully this is doable idea. > >> I think at the end of the day it will be something very close to dm-ioband > >> algorithm just that there will be no lvm driver and no notion of separate > >> dm-ioband device. > > > > Vivek, thanks for the detailed explanation. Only a comment. I guess, if > > we don't change also the per-process optimizations/improvements made by > > some IO scheduler, I think we can have undesirable behaviours. > > > > For example: CFQ uses the per-process iocontext to improve fairness > > between *all* the processes in a system. But it doesn't have the concept > > that there's a cgroup context on-top-of the processes. > > > > So, some optimizations made to guarantee fairness among processes could > > conflict with algorithms implemented at the cgroup layer. And > > potentially lead to undesirable behaviours. > > > > For example an issue I'm experiencing with my cgroup-io-throttle > > patchset is that a cgroup can consistently increase the IO rate (always > > respecting the max limits), simply increasing the number of IO worker > > tasks respect to another cgroup with a lower number of IO workers. This > > is probably due to the fact the CFQ tries to give the same amount of > > "IO time" to all the tasks, without considering that they're organized > > in cgroup. > > BTW this is why I proposed to use a single shared iocontext for all the > processes running in the same cgroup. Anyway, this is not the best > solution, because in this way all the IO requests coming from a cgroup > will be queued to the same cfq queue. If I'm not wrong in this way we > would implement noop (FIFO) between tasks belonging to the same cgroup > and CFQ between cgroups. But, at least for this particular case, we > would be able to provide fairness among cgroups. > > -Andrea I ever thought the same thing but this approach breaks the compatibility. I think we should make ionice only effective for the processes in the same cgroup. A system gives some amount of bandwidths to its cgroups, and the processes in one of the cgroups fairly share the given bandwidth. I think this is the straight approach. What do you think? I think all the CFQ-cgroup the NEC guys are working, OpenVZ team's CFQ scheduler and dm-ioband with bio-cgroup work like this. Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-29 12:07 ` Hirokazu Takahashi (?) @ 2008-09-29 12:13 ` Pavel Emelyanov -1 siblings, 0 replies; 140+ messages in thread From: Pavel Emelyanov @ 2008-09-29 12:13 UTC (permalink / raw) To: Hirokazu Takahashi Cc: righi.andrea, vgoyal, ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, agk, jens.axboe Hirokazu Takahashi wrote: > Hi, Andrea, > >>>> Ok, I will give more details of the thought process. >>>> >>>> I was thinking of maintaing an rb-tree per request queue and not an >>>> rb-tree per cgroup. This tree can contain all the bios submitted to that >>>> request queue through __make_request(). Every node in the tree will represent >>>> one cgroup and will contain a list of bios issued from the tasks from that >>>> cgroup. >>>> >>>> Every bio entering the request queue through __make_request() function >>>> first will be queued in one of the nodes in this rb-tree, depending on which >>>> cgroup that bio belongs to. >>>> >>>> Once the bios are buffered in rb-tree, we release these to underlying >>>> elevator depending on the proportionate weight of the nodes/cgroups. >>>> >>>> Some more details which I was trying to implement yesterday. >>>> >>>> There will be one bio_cgroup object per cgroup. This object will contain >>>> many bio_group objects. Each bio_group object will be created for each >>>> request queue where a bio from bio_cgroup is queued. Essentially the idea >>>> is that bios belonging to a cgroup can be on various request queues in the >>>> system. So a single object can not serve the purpose as it can not be on >>>> many rb-trees at the same time. Hence create one sub object which will keep >>>> track of bios belonging to one cgroup on a particular request queue. >>>> >>>> Each bio_group will contain a list of bios and this bio_group object will >>>> be a node in the rb-tree of request queue. For example. Lets say there are >>>> two request queues in the system q1 and q2 (lets say they belong to /dev/sda >>>> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both >>>> for /dev/sda and /dev/sdb. >>>> >>>> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group >>>> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree >>>> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of >>>> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of >>>> bios issued by task t1 for /dev/sdb. I thought the same can be extended >>>> for stacked devices also. >>>> >>>> I am still trying to implementing it and hopefully this is doable idea. >>>> I think at the end of the day it will be something very close to dm-ioband >>>> algorithm just that there will be no lvm driver and no notion of separate >>>> dm-ioband device. >>> Vivek, thanks for the detailed explanation. Only a comment. I guess, if >>> we don't change also the per-process optimizations/improvements made by >>> some IO scheduler, I think we can have undesirable behaviours. >>> >>> For example: CFQ uses the per-process iocontext to improve fairness >>> between *all* the processes in a system. But it doesn't have the concept >>> that there's a cgroup context on-top-of the processes. >>> >>> So, some optimizations made to guarantee fairness among processes could >>> conflict with algorithms implemented at the cgroup layer. And >>> potentially lead to undesirable behaviours. >>> >>> For example an issue I'm experiencing with my cgroup-io-throttle >>> patchset is that a cgroup can consistently increase the IO rate (always >>> respecting the max limits), simply increasing the number of IO worker >>> tasks respect to another cgroup with a lower number of IO workers. This >>> is probably due to the fact the CFQ tries to give the same amount of >>> "IO time" to all the tasks, without considering that they're organized >>> in cgroup. >> BTW this is why I proposed to use a single shared iocontext for all the >> processes running in the same cgroup. Anyway, this is not the best >> solution, because in this way all the IO requests coming from a cgroup >> will be queued to the same cfq queue. If I'm not wrong in this way we >> would implement noop (FIFO) between tasks belonging to the same cgroup >> and CFQ between cgroups. But, at least for this particular case, we >> would be able to provide fairness among cgroups. >> >> -Andrea > > I ever thought the same thing but this approach breaks the compatibility. > I think we should make ionice only effective for the processes in the > same cgroup. > > A system gives some amount of bandwidths to its cgroups, and > the processes in one of the cgroups fairly share the given bandwidth. > I think this is the straight approach. What do you think? > > I think all the CFQ-cgroup the NEC guys are working, OpenVZ team's CFQ > scheduler and dm-ioband with bio-cgroup work like this. If by "fairly share the given bandwidth" you mean "share according to their IO-nice values" then you're right on this, Hirokazu. We always use a two-level schedulers and would like to see the same behavior in anything that will be the IO-bandwidth-controller in the mainline :) > Thank you, > Hirokazu Takahashi. > > ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080929.210729.117112710.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080929.210729.117112710.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> @ 2008-09-29 12:13 ` Pavel Emelyanov 0 siblings, 0 replies; 140+ messages in thread From: Pavel Emelyanov @ 2008-09-29 12:13 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, fernando-gVGce1chcLdL9jVzuh4AOg, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w Hirokazu Takahashi wrote: > Hi, Andrea, > >>>> Ok, I will give more details of the thought process. >>>> >>>> I was thinking of maintaing an rb-tree per request queue and not an >>>> rb-tree per cgroup. This tree can contain all the bios submitted to that >>>> request queue through __make_request(). Every node in the tree will represent >>>> one cgroup and will contain a list of bios issued from the tasks from that >>>> cgroup. >>>> >>>> Every bio entering the request queue through __make_request() function >>>> first will be queued in one of the nodes in this rb-tree, depending on which >>>> cgroup that bio belongs to. >>>> >>>> Once the bios are buffered in rb-tree, we release these to underlying >>>> elevator depending on the proportionate weight of the nodes/cgroups. >>>> >>>> Some more details which I was trying to implement yesterday. >>>> >>>> There will be one bio_cgroup object per cgroup. This object will contain >>>> many bio_group objects. Each bio_group object will be created for each >>>> request queue where a bio from bio_cgroup is queued. Essentially the idea >>>> is that bios belonging to a cgroup can be on various request queues in the >>>> system. So a single object can not serve the purpose as it can not be on >>>> many rb-trees at the same time. Hence create one sub object which will keep >>>> track of bios belonging to one cgroup on a particular request queue. >>>> >>>> Each bio_group will contain a list of bios and this bio_group object will >>>> be a node in the rb-tree of request queue. For example. Lets say there are >>>> two request queues in the system q1 and q2 (lets say they belong to /dev/sda >>>> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both >>>> for /dev/sda and /dev/sdb. >>>> >>>> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group >>>> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree >>>> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of >>>> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of >>>> bios issued by task t1 for /dev/sdb. I thought the same can be extended >>>> for stacked devices also. >>>> >>>> I am still trying to implementing it and hopefully this is doable idea. >>>> I think at the end of the day it will be something very close to dm-ioband >>>> algorithm just that there will be no lvm driver and no notion of separate >>>> dm-ioband device. >>> Vivek, thanks for the detailed explanation. Only a comment. I guess, if >>> we don't change also the per-process optimizations/improvements made by >>> some IO scheduler, I think we can have undesirable behaviours. >>> >>> For example: CFQ uses the per-process iocontext to improve fairness >>> between *all* the processes in a system. But it doesn't have the concept >>> that there's a cgroup context on-top-of the processes. >>> >>> So, some optimizations made to guarantee fairness among processes could >>> conflict with algorithms implemented at the cgroup layer. And >>> potentially lead to undesirable behaviours. >>> >>> For example an issue I'm experiencing with my cgroup-io-throttle >>> patchset is that a cgroup can consistently increase the IO rate (always >>> respecting the max limits), simply increasing the number of IO worker >>> tasks respect to another cgroup with a lower number of IO workers. This >>> is probably due to the fact the CFQ tries to give the same amount of >>> "IO time" to all the tasks, without considering that they're organized >>> in cgroup. >> BTW this is why I proposed to use a single shared iocontext for all the >> processes running in the same cgroup. Anyway, this is not the best >> solution, because in this way all the IO requests coming from a cgroup >> will be queued to the same cfq queue. If I'm not wrong in this way we >> would implement noop (FIFO) between tasks belonging to the same cgroup >> and CFQ between cgroups. But, at least for this particular case, we >> would be able to provide fairness among cgroups. >> >> -Andrea > > I ever thought the same thing but this approach breaks the compatibility. > I think we should make ionice only effective for the processes in the > same cgroup. > > A system gives some amount of bandwidths to its cgroups, and > the processes in one of the cgroups fairly share the given bandwidth. > I think this is the straight approach. What do you think? > > I think all the CFQ-cgroup the NEC guys are working, OpenVZ team's CFQ > scheduler and dm-ioband with bio-cgroup work like this. If by "fairly share the given bandwidth" you mean "share according to their IO-nice values" then you're right on this, Hirokazu. We always use a two-level schedulers and would like to see the same behavior in anything that will be the IO-bandwidth-controller in the mainline :) > Thank you, > Hirokazu Takahashi. > > ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-29 12:07 ` Hirokazu Takahashi ` (2 preceding siblings ...) (?) @ 2008-09-29 12:13 ` Pavel Emelyanov -1 siblings, 0 replies; 140+ messages in thread From: Pavel Emelyanov @ 2008-09-29 12:13 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, balbir, fernando, vgoyal, righi.andrea Hirokazu Takahashi wrote: > Hi, Andrea, > >>>> Ok, I will give more details of the thought process. >>>> >>>> I was thinking of maintaing an rb-tree per request queue and not an >>>> rb-tree per cgroup. This tree can contain all the bios submitted to that >>>> request queue through __make_request(). Every node in the tree will represent >>>> one cgroup and will contain a list of bios issued from the tasks from that >>>> cgroup. >>>> >>>> Every bio entering the request queue through __make_request() function >>>> first will be queued in one of the nodes in this rb-tree, depending on which >>>> cgroup that bio belongs to. >>>> >>>> Once the bios are buffered in rb-tree, we release these to underlying >>>> elevator depending on the proportionate weight of the nodes/cgroups. >>>> >>>> Some more details which I was trying to implement yesterday. >>>> >>>> There will be one bio_cgroup object per cgroup. This object will contain >>>> many bio_group objects. Each bio_group object will be created for each >>>> request queue where a bio from bio_cgroup is queued. Essentially the idea >>>> is that bios belonging to a cgroup can be on various request queues in the >>>> system. So a single object can not serve the purpose as it can not be on >>>> many rb-trees at the same time. Hence create one sub object which will keep >>>> track of bios belonging to one cgroup on a particular request queue. >>>> >>>> Each bio_group will contain a list of bios and this bio_group object will >>>> be a node in the rb-tree of request queue. For example. Lets say there are >>>> two request queues in the system q1 and q2 (lets say they belong to /dev/sda >>>> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both >>>> for /dev/sda and /dev/sdb. >>>> >>>> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group >>>> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree >>>> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of >>>> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of >>>> bios issued by task t1 for /dev/sdb. I thought the same can be extended >>>> for stacked devices also. >>>> >>>> I am still trying to implementing it and hopefully this is doable idea. >>>> I think at the end of the day it will be something very close to dm-ioband >>>> algorithm just that there will be no lvm driver and no notion of separate >>>> dm-ioband device. >>> Vivek, thanks for the detailed explanation. Only a comment. I guess, if >>> we don't change also the per-process optimizations/improvements made by >>> some IO scheduler, I think we can have undesirable behaviours. >>> >>> For example: CFQ uses the per-process iocontext to improve fairness >>> between *all* the processes in a system. But it doesn't have the concept >>> that there's a cgroup context on-top-of the processes. >>> >>> So, some optimizations made to guarantee fairness among processes could >>> conflict with algorithms implemented at the cgroup layer. And >>> potentially lead to undesirable behaviours. >>> >>> For example an issue I'm experiencing with my cgroup-io-throttle >>> patchset is that a cgroup can consistently increase the IO rate (always >>> respecting the max limits), simply increasing the number of IO worker >>> tasks respect to another cgroup with a lower number of IO workers. This >>> is probably due to the fact the CFQ tries to give the same amount of >>> "IO time" to all the tasks, without considering that they're organized >>> in cgroup. >> BTW this is why I proposed to use a single shared iocontext for all the >> processes running in the same cgroup. Anyway, this is not the best >> solution, because in this way all the IO requests coming from a cgroup >> will be queued to the same cfq queue. If I'm not wrong in this way we >> would implement noop (FIFO) between tasks belonging to the same cgroup >> and CFQ between cgroups. But, at least for this particular case, we >> would be able to provide fairness among cgroups. >> >> -Andrea > > I ever thought the same thing but this approach breaks the compatibility. > I think we should make ionice only effective for the processes in the > same cgroup. > > A system gives some amount of bandwidths to its cgroups, and > the processes in one of the cgroups fairly share the given bandwidth. > I think this is the straight approach. What do you think? > > I think all the CFQ-cgroup the NEC guys are working, OpenVZ team's CFQ > scheduler and dm-ioband with bio-cgroup work like this. If by "fairly share the given bandwidth" you mean "share according to their IO-nice values" then you're right on this, Hirokazu. We always use a two-level schedulers and would like to see the same behavior in anything that will be the IO-bandwidth-controller in the mainline :) > Thank you, > Hirokazu Takahashi. > > ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-26 17:11 ` Andrea Righi ` (3 preceding siblings ...) 2008-09-29 12:07 ` Hirokazu Takahashi @ 2008-09-29 12:07 ` Hirokazu Takahashi 4 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-29 12:07 UTC (permalink / raw) To: righi.andrea Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, vgoyal, balbir Hi, Andrea, > >> Ok, I will give more details of the thought process. > >> > >> I was thinking of maintaing an rb-tree per request queue and not an > >> rb-tree per cgroup. This tree can contain all the bios submitted to that > >> request queue through __make_request(). Every node in the tree will represent > >> one cgroup and will contain a list of bios issued from the tasks from that > >> cgroup. > >> > >> Every bio entering the request queue through __make_request() function > >> first will be queued in one of the nodes in this rb-tree, depending on which > >> cgroup that bio belongs to. > >> > >> Once the bios are buffered in rb-tree, we release these to underlying > >> elevator depending on the proportionate weight of the nodes/cgroups. > >> > >> Some more details which I was trying to implement yesterday. > >> > >> There will be one bio_cgroup object per cgroup. This object will contain > >> many bio_group objects. Each bio_group object will be created for each > >> request queue where a bio from bio_cgroup is queued. Essentially the idea > >> is that bios belonging to a cgroup can be on various request queues in the > >> system. So a single object can not serve the purpose as it can not be on > >> many rb-trees at the same time. Hence create one sub object which will keep > >> track of bios belonging to one cgroup on a particular request queue. > >> > >> Each bio_group will contain a list of bios and this bio_group object will > >> be a node in the rb-tree of request queue. For example. Lets say there are > >> two request queues in the system q1 and q2 (lets say they belong to /dev/sda > >> and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both > >> for /dev/sda and /dev/sdb. > >> > >> bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group > >> objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree > >> and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of > >> bios issued by task t1 for /dev/sda and bio_group2 will contain a list of > >> bios issued by task t1 for /dev/sdb. I thought the same can be extended > >> for stacked devices also. > >> > >> I am still trying to implementing it and hopefully this is doable idea. > >> I think at the end of the day it will be something very close to dm-ioband > >> algorithm just that there will be no lvm driver and no notion of separate > >> dm-ioband device. > > > > Vivek, thanks for the detailed explanation. Only a comment. I guess, if > > we don't change also the per-process optimizations/improvements made by > > some IO scheduler, I think we can have undesirable behaviours. > > > > For example: CFQ uses the per-process iocontext to improve fairness > > between *all* the processes in a system. But it doesn't have the concept > > that there's a cgroup context on-top-of the processes. > > > > So, some optimizations made to guarantee fairness among processes could > > conflict with algorithms implemented at the cgroup layer. And > > potentially lead to undesirable behaviours. > > > > For example an issue I'm experiencing with my cgroup-io-throttle > > patchset is that a cgroup can consistently increase the IO rate (always > > respecting the max limits), simply increasing the number of IO worker > > tasks respect to another cgroup with a lower number of IO workers. This > > is probably due to the fact the CFQ tries to give the same amount of > > "IO time" to all the tasks, without considering that they're organized > > in cgroup. > > BTW this is why I proposed to use a single shared iocontext for all the > processes running in the same cgroup. Anyway, this is not the best > solution, because in this way all the IO requests coming from a cgroup > will be queued to the same cfq queue. If I'm not wrong in this way we > would implement noop (FIFO) between tasks belonging to the same cgroup > and CFQ between cgroups. But, at least for this particular case, we > would be able to provide fairness among cgroups. > > -Andrea I ever thought the same thing but this approach breaks the compatibility. I think we should make ionice only effective for the processes in the same cgroup. A system gives some amount of bandwidths to its cgroups, and the processes in one of the cgroups fairly share the given bandwidth. I think this is the straight approach. What do you think? I think all the CFQ-cgroup the NEC guys are working, OpenVZ team's CFQ scheduler and dm-ioband with bio-cgroup work like this. Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 14:03 ` Vivek Goyal ` (2 preceding siblings ...) (?) @ 2008-09-26 16:11 ` Andrea Righi -1 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 16:11 UTC (permalink / raw) To: Vivek Goyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, Hirokazu Takahashi, dm-devel, agk, xemul, fernando, balbir Vivek Goyal wrote: [snip] > Ok, I will give more details of the thought process. > > I was thinking of maintaing an rb-tree per request queue and not an > rb-tree per cgroup. This tree can contain all the bios submitted to that > request queue through __make_request(). Every node in the tree will represent > one cgroup and will contain a list of bios issued from the tasks from that > cgroup. > > Every bio entering the request queue through __make_request() function > first will be queued in one of the nodes in this rb-tree, depending on which > cgroup that bio belongs to. > > Once the bios are buffered in rb-tree, we release these to underlying > elevator depending on the proportionate weight of the nodes/cgroups. > > Some more details which I was trying to implement yesterday. > > There will be one bio_cgroup object per cgroup. This object will contain > many bio_group objects. Each bio_group object will be created for each > request queue where a bio from bio_cgroup is queued. Essentially the idea > is that bios belonging to a cgroup can be on various request queues in the > system. So a single object can not serve the purpose as it can not be on > many rb-trees at the same time. Hence create one sub object which will keep > track of bios belonging to one cgroup on a particular request queue. > > Each bio_group will contain a list of bios and this bio_group object will > be a node in the rb-tree of request queue. For example. Lets say there are > two request queues in the system q1 and q2 (lets say they belong to /dev/sda > and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both > for /dev/sda and /dev/sdb. > > bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group > objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree > and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of > bios issued by task t1 for /dev/sda and bio_group2 will contain a list of > bios issued by task t1 for /dev/sdb. I thought the same can be extended > for stacked devices also. > > I am still trying to implementing it and hopefully this is doable idea. > I think at the end of the day it will be something very close to dm-ioband > algorithm just that there will be no lvm driver and no notion of separate > dm-ioband device. Vivek, thanks for the detailed explanation. Only a comment. I guess, if we don't change also the per-process optimizations/improvements made by some IO scheduler, I think we can have undesirable behaviours. For example: CFQ uses the per-process iocontext to improve fairness between *all* the processes in a system. But it doesn't have the concept that there's a cgroup context on-top-of the processes. So, some optimizations made to guarantee fairness among processes could conflict with algorithms implemented at the cgroup layer. And potentially lead to undesirable behaviours. For example an issue I'm experiencing with my cgroup-io-throttle patchset is that a cgroup can consistently increase the IO rate (always respecting the max limits), simply increasing the number of IO worker tasks respect to another cgroup with a lower number of IO workers. This is probably due to the fact the CFQ tries to give the same amount of "IO time" to all the tasks, without considering that they're organized in cgroup. I don't see this behaviour with noop or deadline, because they don't have the concept of iocontext. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-22 14:30 ` Vivek Goyal @ 2008-09-24 10:18 ` Hirokazu Takahashi 2008-09-24 8:29 ` Hirokazu Takahashi ` (5 subsequent siblings) 6 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-24 10:18 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir Hi, > > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > > > device we want to subject to rules, I was thinking of maintaining an > > > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > > > __make_request() and then will filter down to elevator associated with the > > > > > queue (if there is one). This will provide us the control of releasing > > > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > > > etc) and no need of stacking additional block device. > > > > > > > > I think it's a bit late to control I/O requests there, since process > > > > may be blocked in get_request_wait when the I/O load is high. > > > > Please imagine the situation that cgroups with low bandwidths are > > > > consuming most of "struct request"s while another cgroup with a high > > > > bandwidth is blocked and can't get enough "struct request"s. > > > > > > > > It means cgroups that issues lot of I/O request can win the game. > > > > > > > > > > Ok, this is a good point. Because number of struct requests are limited > > > and they seem to be allocated on first come first serve basis, so if a > > > cgroup is generating lot of IO, then it might win. > > > > > > But dm-ioband will face the same issue. > > > > Nope. Dm-ioband doesn't have this issue since it works before allocating > > the descriptors. Only I/O requests dm-ioband has passed can allocate its > > descriptor. > > > > Ok. Got it. dm-ioband does not block on allocation of request descriptors. > It does seem to be blocking in prevent_burst_bios() but that would be > per group so it should be fine. Yes. There is also another little mechanism that prevent_burst_bios() tries not to block kernel threads if possible. > That means for lower layers, one shall have to do request descritor > allocation as per the cgroup weight to make sure a cgroup with lower > weight does not get higher % of disk because it is generating more > requests. Yes. But when cgroups with higher weight aren't issueing a lot of I/Os, even a cgroup with lower weight can allocate a lot of request descriptors. > One additional issue with my scheme I just noticed is that I am putting > bio-cgroup in rb-tree. If there are stacked devices then bio/requests from > same cgroup can be at multiple levels of processing at same time. That > would mean that a single cgroup needs to be in multiple rb-trees at the > same time in various layers. So I might have to create a temporary object > which can associate with cgroup and get rid of that object once I don't > have the requests any more... You mean each layer should have its rb-tree? Is it per device? One lvm logical volume may probably consist from several physical volumes, which will be shared with other logical volumes. And some layers may split one bio into several bios. I hardly can imagine how these structures will be. But I guess it is a good thing that we are going to support a general infrastructure for I/O requests. > Well, implementing rb-tree per request queue seems to be harder than I > had thought. Especially taking care of decoupling the elevator and reqeust > descriptor logic at lower layers. Long way to go.. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-24 10:18 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-24 10:18 UTC (permalink / raw) To: vgoyal Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe Hi, > > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > > > device we want to subject to rules, I was thinking of maintaining an > > > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > > > __make_request() and then will filter down to elevator associated with the > > > > > queue (if there is one). This will provide us the control of releasing > > > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > > > etc) and no need of stacking additional block device. > > > > > > > > I think it's a bit late to control I/O requests there, since process > > > > may be blocked in get_request_wait when the I/O load is high. > > > > Please imagine the situation that cgroups with low bandwidths are > > > > consuming most of "struct request"s while another cgroup with a high > > > > bandwidth is blocked and can't get enough "struct request"s. > > > > > > > > It means cgroups that issues lot of I/O request can win the game. > > > > > > > > > > Ok, this is a good point. Because number of struct requests are limited > > > and they seem to be allocated on first come first serve basis, so if a > > > cgroup is generating lot of IO, then it might win. > > > > > > But dm-ioband will face the same issue. > > > > Nope. Dm-ioband doesn't have this issue since it works before allocating > > the descriptors. Only I/O requests dm-ioband has passed can allocate its > > descriptor. > > > > Ok. Got it. dm-ioband does not block on allocation of request descriptors. > It does seem to be blocking in prevent_burst_bios() but that would be > per group so it should be fine. Yes. There is also another little mechanism that prevent_burst_bios() tries not to block kernel threads if possible. > That means for lower layers, one shall have to do request descritor > allocation as per the cgroup weight to make sure a cgroup with lower > weight does not get higher % of disk because it is generating more > requests. Yes. But when cgroups with higher weight aren't issueing a lot of I/Os, even a cgroup with lower weight can allocate a lot of request descriptors. > One additional issue with my scheme I just noticed is that I am putting > bio-cgroup in rb-tree. If there are stacked devices then bio/requests from > same cgroup can be at multiple levels of processing at same time. That > would mean that a single cgroup needs to be in multiple rb-trees at the > same time in various layers. So I might have to create a temporary object > which can associate with cgroup and get rid of that object once I don't > have the requests any more... You mean each layer should have its rb-tree? Is it per device? One lvm logical volume may probably consist from several physical volumes, which will be shared with other logical volumes. And some layers may split one bio into several bios. I hardly can imagine how these structures will be. But I guess it is a good thing that we are going to support a general infrastructure for I/O requests. > Well, implementing rb-tree per request queue seems to be harder than I > had thought. Especially taking care of decoupling the elevator and reqeust > descriptor logic at lower layers. Long way to go.. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080924.191803.100102323.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080924.191803.100102323.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> @ 2008-09-24 14:52 ` Vivek Goyal 0 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-24 14:52 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 On Wed, Sep 24, 2008 at 07:18:03PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > > > > device we want to subject to rules, I was thinking of maintaining an > > > > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > > > > __make_request() and then will filter down to elevator associated with the > > > > > > queue (if there is one). This will provide us the control of releasing > > > > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > > > > etc) and no need of stacking additional block device. > > > > > > > > > > I think it's a bit late to control I/O requests there, since process > > > > > may be blocked in get_request_wait when the I/O load is high. > > > > > Please imagine the situation that cgroups with low bandwidths are > > > > > consuming most of "struct request"s while another cgroup with a high > > > > > bandwidth is blocked and can't get enough "struct request"s. > > > > > > > > > > It means cgroups that issues lot of I/O request can win the game. > > > > > > > > > > > > > Ok, this is a good point. Because number of struct requests are limited > > > > and they seem to be allocated on first come first serve basis, so if a > > > > cgroup is generating lot of IO, then it might win. > > > > > > > > But dm-ioband will face the same issue. > > > > > > Nope. Dm-ioband doesn't have this issue since it works before allocating > > > the descriptors. Only I/O requests dm-ioband has passed can allocate its > > > descriptor. > > > > > > > Ok. Got it. dm-ioband does not block on allocation of request descriptors. > > It does seem to be blocking in prevent_burst_bios() but that would be > > per group so it should be fine. > > Yes. There is also another little mechanism that prevent_burst_bios() > tries not to block kernel threads if possible. > > > That means for lower layers, one shall have to do request descritor > > allocation as per the cgroup weight to make sure a cgroup with lower > > weight does not get higher % of disk because it is generating more > > requests. > > Yes. But when cgroups with higher weight aren't issueing a lot of I/Os, > even a cgroup with lower weight can allocate a lot of request descriptors. > ok. Now with the new thought, I am completely deprecating the idea of queuing the request descriptors. Now I am thinking of capturing the bios and buffering these into the rb-tree as soon as these enter the request queue using associated request function. All the request descriptor allocation will come later when bios are actually release to elevator from the rb-tree. That way we should be able to get rid of this issue. > > One additional issue with my scheme I just noticed is that I am putting > > bio-cgroup in rb-tree. If there are stacked devices then bio/requests from > > same cgroup can be at multiple levels of processing at same time. That > > would mean that a single cgroup needs to be in multiple rb-trees at the > > same time in various layers. So I might have to create a temporary object > > which can associate with cgroup and get rid of that object once I don't > > have the requests any more... > > You mean each layer should have its rb-tree? Is it per device? > One lvm logical volume may probably consist from several physical > volumes, which will be shared with other logical volumes. > And some layers may split one bio into several bios. > I hardly can imagine how these structures will be. > Yes, one rb-tree per device, be it physical device or logical device (because there is one request queue associated per physical/logical block device). I was thinking of getting hold/hijack the bios as soon as they are submitted to the device using associated request function. So if there is a logical device built on top of two physical device, the associated bio copy or other logic should not even see the bio the moment it is submitted to the deivce. It will see the bio only when it is released from associated rb-tree to them. Do you think this will not work? To me this is what dm-ioband is doing logically. The only difference is that it does this with the help of a separate request queue. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 10:18 ` Hirokazu Takahashi @ 2008-09-24 14:52 ` Vivek Goyal -1 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-24 14:52 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir On Wed, Sep 24, 2008 at 07:18:03PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > > > > device we want to subject to rules, I was thinking of maintaining an > > > > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > > > > __make_request() and then will filter down to elevator associated with the > > > > > > queue (if there is one). This will provide us the control of releasing > > > > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > > > > etc) and no need of stacking additional block device. > > > > > > > > > > I think it's a bit late to control I/O requests there, since process > > > > > may be blocked in get_request_wait when the I/O load is high. > > > > > Please imagine the situation that cgroups with low bandwidths are > > > > > consuming most of "struct request"s while another cgroup with a high > > > > > bandwidth is blocked and can't get enough "struct request"s. > > > > > > > > > > It means cgroups that issues lot of I/O request can win the game. > > > > > > > > > > > > > Ok, this is a good point. Because number of struct requests are limited > > > > and they seem to be allocated on first come first serve basis, so if a > > > > cgroup is generating lot of IO, then it might win. > > > > > > > > But dm-ioband will face the same issue. > > > > > > Nope. Dm-ioband doesn't have this issue since it works before allocating > > > the descriptors. Only I/O requests dm-ioband has passed can allocate its > > > descriptor. > > > > > > > Ok. Got it. dm-ioband does not block on allocation of request descriptors. > > It does seem to be blocking in prevent_burst_bios() but that would be > > per group so it should be fine. > > Yes. There is also another little mechanism that prevent_burst_bios() > tries not to block kernel threads if possible. > > > That means for lower layers, one shall have to do request descritor > > allocation as per the cgroup weight to make sure a cgroup with lower > > weight does not get higher % of disk because it is generating more > > requests. > > Yes. But when cgroups with higher weight aren't issueing a lot of I/Os, > even a cgroup with lower weight can allocate a lot of request descriptors. > ok. Now with the new thought, I am completely deprecating the idea of queuing the request descriptors. Now I am thinking of capturing the bios and buffering these into the rb-tree as soon as these enter the request queue using associated request function. All the request descriptor allocation will come later when bios are actually release to elevator from the rb-tree. That way we should be able to get rid of this issue. > > One additional issue with my scheme I just noticed is that I am putting > > bio-cgroup in rb-tree. If there are stacked devices then bio/requests from > > same cgroup can be at multiple levels of processing at same time. That > > would mean that a single cgroup needs to be in multiple rb-trees at the > > same time in various layers. So I might have to create a temporary object > > which can associate with cgroup and get rid of that object once I don't > > have the requests any more... > > You mean each layer should have its rb-tree? Is it per device? > One lvm logical volume may probably consist from several physical > volumes, which will be shared with other logical volumes. > And some layers may split one bio into several bios. > I hardly can imagine how these structures will be. > Yes, one rb-tree per device, be it physical device or logical device (because there is one request queue associated per physical/logical block device). I was thinking of getting hold/hijack the bios as soon as they are submitted to the device using associated request function. So if there is a logical device built on top of two physical device, the associated bio copy or other logic should not even see the bio the moment it is submitted to the deivce. It will see the bio only when it is released from associated rb-tree to them. Do you think this will not work? To me this is what dm-ioband is doing logically. The only difference is that it does this with the help of a separate request queue. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-24 14:52 ` Vivek Goyal 0 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-24 14:52 UTC (permalink / raw) To: Hirokazu Takahashi Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe On Wed, Sep 24, 2008 at 07:18:03PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > > > > device we want to subject to rules, I was thinking of maintaining an > > > > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > > > > __make_request() and then will filter down to elevator associated with the > > > > > > queue (if there is one). This will provide us the control of releasing > > > > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > > > > etc) and no need of stacking additional block device. > > > > > > > > > > I think it's a bit late to control I/O requests there, since process > > > > > may be blocked in get_request_wait when the I/O load is high. > > > > > Please imagine the situation that cgroups with low bandwidths are > > > > > consuming most of "struct request"s while another cgroup with a high > > > > > bandwidth is blocked and can't get enough "struct request"s. > > > > > > > > > > It means cgroups that issues lot of I/O request can win the game. > > > > > > > > > > > > > Ok, this is a good point. Because number of struct requests are limited > > > > and they seem to be allocated on first come first serve basis, so if a > > > > cgroup is generating lot of IO, then it might win. > > > > > > > > But dm-ioband will face the same issue. > > > > > > Nope. Dm-ioband doesn't have this issue since it works before allocating > > > the descriptors. Only I/O requests dm-ioband has passed can allocate its > > > descriptor. > > > > > > > Ok. Got it. dm-ioband does not block on allocation of request descriptors. > > It does seem to be blocking in prevent_burst_bios() but that would be > > per group so it should be fine. > > Yes. There is also another little mechanism that prevent_burst_bios() > tries not to block kernel threads if possible. > > > That means for lower layers, one shall have to do request descritor > > allocation as per the cgroup weight to make sure a cgroup with lower > > weight does not get higher % of disk because it is generating more > > requests. > > Yes. But when cgroups with higher weight aren't issueing a lot of I/Os, > even a cgroup with lower weight can allocate a lot of request descriptors. > ok. Now with the new thought, I am completely deprecating the idea of queuing the request descriptors. Now I am thinking of capturing the bios and buffering these into the rb-tree as soon as these enter the request queue using associated request function. All the request descriptor allocation will come later when bios are actually release to elevator from the rb-tree. That way we should be able to get rid of this issue. > > One additional issue with my scheme I just noticed is that I am putting > > bio-cgroup in rb-tree. If there are stacked devices then bio/requests from > > same cgroup can be at multiple levels of processing at same time. That > > would mean that a single cgroup needs to be in multiple rb-trees at the > > same time in various layers. So I might have to create a temporary object > > which can associate with cgroup and get rid of that object once I don't > > have the requests any more... > > You mean each layer should have its rb-tree? Is it per device? > One lvm logical volume may probably consist from several physical > volumes, which will be shared with other logical volumes. > And some layers may split one bio into several bios. > I hardly can imagine how these structures will be. > Yes, one rb-tree per device, be it physical device or logical device (because there is one request queue associated per physical/logical block device). I was thinking of getting hold/hijack the bios as soon as they are submitted to the device using associated request function. So if there is a logical device built on top of two physical device, the associated bio copy or other logic should not even see the bio the moment it is submitted to the deivce. It will see the bio only when it is released from associated rb-tree to them. Do you think this will not work? To me this is what dm-ioband is doing logically. The only difference is that it does this with the help of a separate request queue. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 14:52 ` Vivek Goyal (?) @ 2008-09-26 12:42 ` Hirokazu Takahashi -1 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-26 12:42 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir Hi, > > > One additional issue with my scheme I just noticed is that I am putting > > > bio-cgroup in rb-tree. If there are stacked devices then bio/requests from > > > same cgroup can be at multiple levels of processing at same time. That > > > would mean that a single cgroup needs to be in multiple rb-trees at the > > > same time in various layers. So I might have to create a temporary object > > > which can associate with cgroup and get rid of that object once I don't > > > have the requests any more... > > > > You mean each layer should have its rb-tree? Is it per device? > > One lvm logical volume may probably consist from several physical > > volumes, which will be shared with other logical volumes. > > And some layers may split one bio into several bios. > > I hardly can imagine how these structures will be. > > > > Yes, one rb-tree per device, be it physical device or logical device > (because there is one request queue associated per physical/logical block > device). No, logical block devices doesn't have any request queues and they essentially won't block any bios unless it is impossible to handle them at the moment. Device-mappers never touch any request queues. > I was thinking of getting hold/hijack the bios as soon as they are > submitted to the device using associated request function. So if there > is a logical device built on top of two physical device, the associated > bio copy or other logic should not even see the bio the moment it is > submitted to the deivce. It will see the bio only when it is released > from associated rb-tree to them. Do you think this will not work? To me > this is what dm-ioband is doing logically. The only difference is that it > does this with the help of a separate request queue. I think it's easy to just make all logical device --- device mapper device --- and all physical device have their own bandwidth control mechanism. But I'm not clear how your algorithm works to control the bandwidth. At which level are you going to guarantee the bandwidth, at the logical volumes layer such as lvm or at the physical device layer? Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 14:52 ` Vivek Goyal @ 2008-09-26 12:42 ` Hirokazu Takahashi -1 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-26 12:42 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, ryov, xemul, fernando, balbir Hi, > > > One additional issue with my scheme I just noticed is that I am putting > > > bio-cgroup in rb-tree. If there are stacked devices then bio/requests from > > > same cgroup can be at multiple levels of processing at same time. That > > > would mean that a single cgroup needs to be in multiple rb-trees at the > > > same time in various layers. So I might have to create a temporary object > > > which can associate with cgroup and get rid of that object once I don't > > > have the requests any more... > > > > You mean each layer should have its rb-tree? Is it per device? > > One lvm logical volume may probably consist from several physical > > volumes, which will be shared with other logical volumes. > > And some layers may split one bio into several bios. > > I hardly can imagine how these structures will be. > > > > Yes, one rb-tree per device, be it physical device or logical device > (because there is one request queue associated per physical/logical block > device). No, logical block devices doesn't have any request queues and they essentially won't block any bios unless it is impossible to handle them at the moment. Device-mappers never touch any request queues. > I was thinking of getting hold/hijack the bios as soon as they are > submitted to the device using associated request function. So if there > is a logical device built on top of two physical device, the associated > bio copy or other logic should not even see the bio the moment it is > submitted to the deivce. It will see the bio only when it is released > from associated rb-tree to them. Do you think this will not work? To me > this is what dm-ioband is doing logically. The only difference is that it > does this with the help of a separate request queue. I think it's easy to just make all logical device --- device mapper device --- and all physical device have their own bandwidth control mechanism. But I'm not clear how your algorithm works to control the bandwidth. At which level are you going to guarantee the bandwidth, at the logical volumes layer such as lvm or at the physical device layer? Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-26 12:42 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-26 12:42 UTC (permalink / raw) To: vgoyal Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe Hi, > > > One additional issue with my scheme I just noticed is that I am putting > > > bio-cgroup in rb-tree. If there are stacked devices then bio/requests from > > > same cgroup can be at multiple levels of processing at same time. That > > > would mean that a single cgroup needs to be in multiple rb-trees at the > > > same time in various layers. So I might have to create a temporary object > > > which can associate with cgroup and get rid of that object once I don't > > > have the requests any more... > > > > You mean each layer should have its rb-tree? Is it per device? > > One lvm logical volume may probably consist from several physical > > volumes, which will be shared with other logical volumes. > > And some layers may split one bio into several bios. > > I hardly can imagine how these structures will be. > > > > Yes, one rb-tree per device, be it physical device or logical device > (because there is one request queue associated per physical/logical block > device). No, logical block devices doesn't have any request queues and they essentially won't block any bios unless it is impossible to handle them at the moment. Device-mappers never touch any request queues. > I was thinking of getting hold/hijack the bios as soon as they are > submitted to the device using associated request function. So if there > is a logical device built on top of two physical device, the associated > bio copy or other logic should not even see the bio the moment it is > submitted to the deivce. It will see the bio only when it is released > from associated rb-tree to them. Do you think this will not work? To me > this is what dm-ioband is doing logically. The only difference is that it > does this with the help of a separate request queue. I think it's easy to just make all logical device --- device mapper device --- and all physical device have their own bandwidth control mechanism. But I'm not clear how your algorithm works to control the bandwidth. At which level are you going to guarantee the bandwidth, at the logical volumes layer such as lvm or at the physical device layer? Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080924145202.GC547-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080924145202.GC547-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-09-26 12:42 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-26 12:42 UTC (permalink / raw) To: vgoyal-H+wXaHxf7aLQT0dZR+AlfA Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi, > > > One additional issue with my scheme I just noticed is that I am putting > > > bio-cgroup in rb-tree. If there are stacked devices then bio/requests from > > > same cgroup can be at multiple levels of processing at same time. That > > > would mean that a single cgroup needs to be in multiple rb-trees at the > > > same time in various layers. So I might have to create a temporary object > > > which can associate with cgroup and get rid of that object once I don't > > > have the requests any more... > > > > You mean each layer should have its rb-tree? Is it per device? > > One lvm logical volume may probably consist from several physical > > volumes, which will be shared with other logical volumes. > > And some layers may split one bio into several bios. > > I hardly can imagine how these structures will be. > > > > Yes, one rb-tree per device, be it physical device or logical device > (because there is one request queue associated per physical/logical block > device). No, logical block devices doesn't have any request queues and they essentially won't block any bios unless it is impossible to handle them at the moment. Device-mappers never touch any request queues. > I was thinking of getting hold/hijack the bios as soon as they are > submitted to the device using associated request function. So if there > is a logical device built on top of two physical device, the associated > bio copy or other logic should not even see the bio the moment it is > submitted to the deivce. It will see the bio only when it is released > from associated rb-tree to them. Do you think this will not work? To me > this is what dm-ioband is doing logically. The only difference is that it > does this with the help of a separate request queue. I think it's easy to just make all logical device --- device mapper device --- and all physical device have their own bandwidth control mechanism. But I'm not clear how your algorithm works to control the bandwidth. At which level are you going to guarantee the bandwidth, at the logical volumes layer such as lvm or at the physical device layer? Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 10:18 ` Hirokazu Takahashi ` (2 preceding siblings ...) (?) @ 2008-09-24 14:52 ` Vivek Goyal -1 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-24 14:52 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir On Wed, Sep 24, 2008 at 07:18:03PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > > > > device we want to subject to rules, I was thinking of maintaining an > > > > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > > > > __make_request() and then will filter down to elevator associated with the > > > > > > queue (if there is one). This will provide us the control of releasing > > > > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > > > > etc) and no need of stacking additional block device. > > > > > > > > > > I think it's a bit late to control I/O requests there, since process > > > > > may be blocked in get_request_wait when the I/O load is high. > > > > > Please imagine the situation that cgroups with low bandwidths are > > > > > consuming most of "struct request"s while another cgroup with a high > > > > > bandwidth is blocked and can't get enough "struct request"s. > > > > > > > > > > It means cgroups that issues lot of I/O request can win the game. > > > > > > > > > > > > > Ok, this is a good point. Because number of struct requests are limited > > > > and they seem to be allocated on first come first serve basis, so if a > > > > cgroup is generating lot of IO, then it might win. > > > > > > > > But dm-ioband will face the same issue. > > > > > > Nope. Dm-ioband doesn't have this issue since it works before allocating > > > the descriptors. Only I/O requests dm-ioband has passed can allocate its > > > descriptor. > > > > > > > Ok. Got it. dm-ioband does not block on allocation of request descriptors. > > It does seem to be blocking in prevent_burst_bios() but that would be > > per group so it should be fine. > > Yes. There is also another little mechanism that prevent_burst_bios() > tries not to block kernel threads if possible. > > > That means for lower layers, one shall have to do request descritor > > allocation as per the cgroup weight to make sure a cgroup with lower > > weight does not get higher % of disk because it is generating more > > requests. > > Yes. But when cgroups with higher weight aren't issueing a lot of I/Os, > even a cgroup with lower weight can allocate a lot of request descriptors. > ok. Now with the new thought, I am completely deprecating the idea of queuing the request descriptors. Now I am thinking of capturing the bios and buffering these into the rb-tree as soon as these enter the request queue using associated request function. All the request descriptor allocation will come later when bios are actually release to elevator from the rb-tree. That way we should be able to get rid of this issue. > > One additional issue with my scheme I just noticed is that I am putting > > bio-cgroup in rb-tree. If there are stacked devices then bio/requests from > > same cgroup can be at multiple levels of processing at same time. That > > would mean that a single cgroup needs to be in multiple rb-trees at the > > same time in various layers. So I might have to create a temporary object > > which can associate with cgroup and get rid of that object once I don't > > have the requests any more... > > You mean each layer should have its rb-tree? Is it per device? > One lvm logical volume may probably consist from several physical > volumes, which will be shared with other logical volumes. > And some layers may split one bio into several bios. > I hardly can imagine how these structures will be. > Yes, one rb-tree per device, be it physical device or logical device (because there is one request queue associated per physical/logical block device). I was thinking of getting hold/hijack the bios as soon as they are submitted to the device using associated request function. So if there is a logical device built on top of two physical device, the associated bio copy or other logic should not even see the bio the moment it is submitted to the deivce. It will see the bio only when it is released from associated rb-tree to them. Do you think this will not work? To me this is what dm-ioband is doing logically. The only difference is that it does this with the help of a separate request queue. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080922143042.GA19222-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080922143042.GA19222-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-09-24 8:29 ` Hirokazu Takahashi 2008-09-24 10:18 ` Hirokazu Takahashi 2008-09-24 10:34 ` Hirokazu Takahashi 2 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-24 8:29 UTC (permalink / raw) To: vgoyal-H+wXaHxf7aLQT0dZR+AlfA Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > > > such as seek time. > > > > > > > > > > > > You can see the details of the benchmarks at: > > > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > > > > > (snip) > > > > > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > > > every device we want to control using rules. This looks little odd > > > > > atleast to me. Can't we keep it in line with rest of the controllers > > > > > where task grouping takes place using cgroup and rules are specified in > > > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > > > I've been also considering that this algorithm itself can be implemented > > > > in the block layer directly. > > > > > > > > Although, the current implementation has merits. It is flexible. > > > > - Dm-ioband can be place anywhere you like, which may be right before > > > > the I/O schedulers or may be placed on top of LVM devices. > > > > > > Hi, > > > > > > An rb-tree per request queue also should be able to give us this > > > flexibility. Because logic is implemented per request queue, rules can be > > > placed at any layer. Either at bottom most layer where requests are > > > passed to elevator or at higher layer where requests will be passed to > > > lower level block devices in the stack. Just that we shall have to do > > > modifications to some of the higher level dm/md drivers to make use of > > > queuing cgroup requests and releasing cgroup requests to lower layers. > > > > Request descriptors are allocated just right before passing I/O requests > > to the elevators. Even if you move the descriptor allocation point > > before calling the dm/md drivers, the drivers can't make use of them. > > > > You are right. request descriptors are currently allocated at bottom > most layer. Anyway, in the rb-tree, we put bio cgroups as logical elements > and every bio cgroup then contains the list of either bios or requeust > descriptors. So what kind of list bio-cgroup maintains can depend on > whether it is a higher layer driver (will maintain bios) or a lower layer > driver (will maintain list of request descriptors per bio-cgroup). I'm getting confused about your idea. I thought you wanted to make each cgroup have its own rb-tree, and wanted to make all the layers share the same rb-tree. If so, are you going to put different things into the same tree? Do you even want all the I/O schedlers use the same tree? Are you going to block request descriptors in the tree? From the view point of performance, all the request descriptors should be passed to the I/O schedulers, since the maximum number of request descriptors is limited. And I still don't understand if you want to make your rb-tree work efficiently, you need to put a lot of bios or request descriptors into the tree. Is that what you are going to do? On the other hand, dm-ioband tries to minimize to have bios blocked. And I have a plan on reducing the maximum number that can be blocked there. Sorry to bother you that I just don't understand the concept clearly. > So basically mechanism of maintaining an rb-tree can be completely > ignorant of the fact whether a driver is keeping track of bios or keeping > track of requests per cgroup. I don't care whether the queue is implemented as a rb-tee or some kind of list because they are logically the same thing. > > When one of the dm drivers accepts a I/O request, the request > > won't have either a real device number or a real sector number. > > The request will be re-mapped to another sector of another device > > in every dm drivers. The request may even be replicated there. > > So it is really hard to find the right request queue to put > > the request into and sort them on the queue. > > Hmm.., I thought that all the incoming requests to dm/md driver will > remain in a single queue maintained by that drvier (irrespective of the > fact in which request queue these requests go in lower layers after > replication or other operation). I am not very familiar with dm/md > implementation. I will read more about it.... They never look into the queues maintained in drivers. Some of them have its own little queue and others don't. Some may just modify the sector numbers of I/O requests or may create a new I/O request themselves. Others such as md-raid5 have their own queues to control I/Os, where A write request may cause several read requests and have to wait for their completions before the actual write starts. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080922143042.GA19222-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2008-09-24 8:29 ` Hirokazu Takahashi @ 2008-09-24 10:18 ` Hirokazu Takahashi 2008-09-24 10:34 ` Hirokazu Takahashi 2 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-24 10:18 UTC (permalink / raw) To: vgoyal-H+wXaHxf7aLQT0dZR+AlfA Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi, > > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > > > device we want to subject to rules, I was thinking of maintaining an > > > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > > > __make_request() and then will filter down to elevator associated with the > > > > > queue (if there is one). This will provide us the control of releasing > > > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > > > etc) and no need of stacking additional block device. > > > > > > > > I think it's a bit late to control I/O requests there, since process > > > > may be blocked in get_request_wait when the I/O load is high. > > > > Please imagine the situation that cgroups with low bandwidths are > > > > consuming most of "struct request"s while another cgroup with a high > > > > bandwidth is blocked and can't get enough "struct request"s. > > > > > > > > It means cgroups that issues lot of I/O request can win the game. > > > > > > > > > > Ok, this is a good point. Because number of struct requests are limited > > > and they seem to be allocated on first come first serve basis, so if a > > > cgroup is generating lot of IO, then it might win. > > > > > > But dm-ioband will face the same issue. > > > > Nope. Dm-ioband doesn't have this issue since it works before allocating > > the descriptors. Only I/O requests dm-ioband has passed can allocate its > > descriptor. > > > > Ok. Got it. dm-ioband does not block on allocation of request descriptors. > It does seem to be blocking in prevent_burst_bios() but that would be > per group so it should be fine. Yes. There is also another little mechanism that prevent_burst_bios() tries not to block kernel threads if possible. > That means for lower layers, one shall have to do request descritor > allocation as per the cgroup weight to make sure a cgroup with lower > weight does not get higher % of disk because it is generating more > requests. Yes. But when cgroups with higher weight aren't issueing a lot of I/Os, even a cgroup with lower weight can allocate a lot of request descriptors. > One additional issue with my scheme I just noticed is that I am putting > bio-cgroup in rb-tree. If there are stacked devices then bio/requests from > same cgroup can be at multiple levels of processing at same time. That > would mean that a single cgroup needs to be in multiple rb-trees at the > same time in various layers. So I might have to create a temporary object > which can associate with cgroup and get rid of that object once I don't > have the requests any more... You mean each layer should have its rb-tree? Is it per device? One lvm logical volume may probably consist from several physical volumes, which will be shared with other logical volumes. And some layers may split one bio into several bios. I hardly can imagine how these structures will be. But I guess it is a good thing that we are going to support a general infrastructure for I/O requests. > Well, implementing rb-tree per request queue seems to be harder than I > had thought. Especially taking care of decoupling the elevator and reqeust > descriptor logic at lower layers. Long way to go.. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080922143042.GA19222-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2008-09-24 8:29 ` Hirokazu Takahashi 2008-09-24 10:18 ` Hirokazu Takahashi @ 2008-09-24 10:34 ` Hirokazu Takahashi 2 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-24 10:34 UTC (permalink / raw) To: vgoyal-H+wXaHxf7aLQT0dZR+AlfA Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi, > > It's possible the algorithm of dm-ioband can be placed in the block layer > > if it is really a big problem. > > But I doubt it can control every control block I/O as we wish since > > the interface the cgroup supports is quite poor. > > Had a question regarding cgroup interface. I am assuming that in a system, > one will be using other controllers as well apart from IO-controller. > Other controllers will be using cgroup as a grouping mechanism. > Now coming up with additional grouping mechanism for only io-controller seems > little odd to me. It will make the job of higher level management software > harder. > > Looking at the dm-ioband grouping examples given in patches, I think cases > of grouping based in pid, pgrp, uid and kvm can be handled by creating right > cgroup and making sure applications are launched/moved into right cgroup by > user space tools. Grouping in pid, pgrp and uid is not the point, which I've been thinking can be replaced with cgroup once the implementation of bio-cgroup is done. I think problems of cgroup are that they can't support lots of storages and hotplug devices, it just handle them as if they were just one resource. I don't insist the interface of dm-ioband is the best. I just hope the cgroup infrastructure support this kind of resources. > I think keeping grouping mechanism in line with rest of the controllers > should help because a uniform grouping mechanism should make life simpler. > > I am not very sure about moving dm-ioband algorithm in block layer. Looks > like it will make life simpler at least in terms of configuration. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-22 14:30 ` Vivek Goyal ` (3 preceding siblings ...) [not found] ` <20080922143042.GA19222-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-09-24 10:18 ` Hirokazu Takahashi 2008-09-24 10:34 ` Hirokazu Takahashi 2008-09-24 10:34 ` Hirokazu Takahashi 6 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-24 10:18 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir Hi, > > > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > > > device we want to subject to rules, I was thinking of maintaining an > > > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > > > __make_request() and then will filter down to elevator associated with the > > > > > queue (if there is one). This will provide us the control of releasing > > > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > > > etc) and no need of stacking additional block device. > > > > > > > > I think it's a bit late to control I/O requests there, since process > > > > may be blocked in get_request_wait when the I/O load is high. > > > > Please imagine the situation that cgroups with low bandwidths are > > > > consuming most of "struct request"s while another cgroup with a high > > > > bandwidth is blocked and can't get enough "struct request"s. > > > > > > > > It means cgroups that issues lot of I/O request can win the game. > > > > > > > > > > Ok, this is a good point. Because number of struct requests are limited > > > and they seem to be allocated on first come first serve basis, so if a > > > cgroup is generating lot of IO, then it might win. > > > > > > But dm-ioband will face the same issue. > > > > Nope. Dm-ioband doesn't have this issue since it works before allocating > > the descriptors. Only I/O requests dm-ioband has passed can allocate its > > descriptor. > > > > Ok. Got it. dm-ioband does not block on allocation of request descriptors. > It does seem to be blocking in prevent_burst_bios() but that would be > per group so it should be fine. Yes. There is also another little mechanism that prevent_burst_bios() tries not to block kernel threads if possible. > That means for lower layers, one shall have to do request descritor > allocation as per the cgroup weight to make sure a cgroup with lower > weight does not get higher % of disk because it is generating more > requests. Yes. But when cgroups with higher weight aren't issueing a lot of I/Os, even a cgroup with lower weight can allocate a lot of request descriptors. > One additional issue with my scheme I just noticed is that I am putting > bio-cgroup in rb-tree. If there are stacked devices then bio/requests from > same cgroup can be at multiple levels of processing at same time. That > would mean that a single cgroup needs to be in multiple rb-trees at the > same time in various layers. So I might have to create a temporary object > which can associate with cgroup and get rid of that object once I don't > have the requests any more... You mean each layer should have its rb-tree? Is it per device? One lvm logical volume may probably consist from several physical volumes, which will be shared with other logical volumes. And some layers may split one bio into several bios. I hardly can imagine how these structures will be. But I guess it is a good thing that we are going to support a general infrastructure for I/O requests. > Well, implementing rb-tree per request queue seems to be harder than I > had thought. Especially taking care of decoupling the elevator and reqeust > descriptor logic at lower layers. Long way to go.. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-22 14:30 ` Vivek Goyal ` (4 preceding siblings ...) 2008-09-24 10:18 ` Hirokazu Takahashi @ 2008-09-24 10:34 ` Hirokazu Takahashi 2008-09-24 10:34 ` Hirokazu Takahashi 6 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-24 10:34 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir Hi, > > It's possible the algorithm of dm-ioband can be placed in the block layer > > if it is really a big problem. > > But I doubt it can control every control block I/O as we wish since > > the interface the cgroup supports is quite poor. > > Had a question regarding cgroup interface. I am assuming that in a system, > one will be using other controllers as well apart from IO-controller. > Other controllers will be using cgroup as a grouping mechanism. > Now coming up with additional grouping mechanism for only io-controller seems > little odd to me. It will make the job of higher level management software > harder. > > Looking at the dm-ioband grouping examples given in patches, I think cases > of grouping based in pid, pgrp, uid and kvm can be handled by creating right > cgroup and making sure applications are launched/moved into right cgroup by > user space tools. Grouping in pid, pgrp and uid is not the point, which I've been thinking can be replaced with cgroup once the implementation of bio-cgroup is done. I think problems of cgroup are that they can't support lots of storages and hotplug devices, it just handle them as if they were just one resource. I don't insist the interface of dm-ioband is the best. I just hope the cgroup infrastructure support this kind of resources. > I think keeping grouping mechanism in line with rest of the controllers > should help because a uniform grouping mechanism should make life simpler. > > I am not very sure about moving dm-ioband algorithm in block layer. Looks > like it will make life simpler at least in terms of configuration. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-22 14:30 ` Vivek Goyal @ 2008-09-24 10:34 ` Hirokazu Takahashi 2008-09-24 8:29 ` Hirokazu Takahashi ` (5 subsequent siblings) 6 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-24 10:34 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, ryov, xemul, fernando, balbir Hi, > > It's possible the algorithm of dm-ioband can be placed in the block layer > > if it is really a big problem. > > But I doubt it can control every control block I/O as we wish since > > the interface the cgroup supports is quite poor. > > Had a question regarding cgroup interface. I am assuming that in a system, > one will be using other controllers as well apart from IO-controller. > Other controllers will be using cgroup as a grouping mechanism. > Now coming up with additional grouping mechanism for only io-controller seems > little odd to me. It will make the job of higher level management software > harder. > > Looking at the dm-ioband grouping examples given in patches, I think cases > of grouping based in pid, pgrp, uid and kvm can be handled by creating right > cgroup and making sure applications are launched/moved into right cgroup by > user space tools. Grouping in pid, pgrp and uid is not the point, which I've been thinking can be replaced with cgroup once the implementation of bio-cgroup is done. I think problems of cgroup are that they can't support lots of storages and hotplug devices, it just handle them as if they were just one resource. I don't insist the interface of dm-ioband is the best. I just hope the cgroup infrastructure support this kind of resources. > I think keeping grouping mechanism in line with rest of the controllers > should help because a uniform grouping mechanism should make life simpler. > > I am not very sure about moving dm-ioband algorithm in block layer. Looks > like it will make life simpler at least in terms of configuration. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-24 10:34 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-24 10:34 UTC (permalink / raw) To: vgoyal Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe Hi, > > It's possible the algorithm of dm-ioband can be placed in the block layer > > if it is really a big problem. > > But I doubt it can control every control block I/O as we wish since > > the interface the cgroup supports is quite poor. > > Had a question regarding cgroup interface. I am assuming that in a system, > one will be using other controllers as well apart from IO-controller. > Other controllers will be using cgroup as a grouping mechanism. > Now coming up with additional grouping mechanism for only io-controller seems > little odd to me. It will make the job of higher level management software > harder. > > Looking at the dm-ioband grouping examples given in patches, I think cases > of grouping based in pid, pgrp, uid and kvm can be handled by creating right > cgroup and making sure applications are launched/moved into right cgroup by > user space tools. Grouping in pid, pgrp and uid is not the point, which I've been thinking can be replaced with cgroup once the implementation of bio-cgroup is done. I think problems of cgroup are that they can't support lots of storages and hotplug devices, it just handle them as if they were just one resource. I don't insist the interface of dm-ioband is the best. I just hope the cgroup infrastructure support this kind of resources. > I think keeping grouping mechanism in line with rest of the controllers > should help because a uniform grouping mechanism should make life simpler. > > I am not very sure about moving dm-ioband algorithm in block layer. Looks > like it will make life simpler at least in terms of configuration. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 10:34 ` Hirokazu Takahashi (?) @ 2008-09-24 12:38 ` Balbir Singh -1 siblings, 0 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-24 12:38 UTC (permalink / raw) To: Hirokazu Takahashi Cc: vgoyal, ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, xemul, agk, righi.andrea, jens.axboe Hirokazu Takahashi wrote: > Hi, > >>> It's possible the algorithm of dm-ioband can be placed in the block layer >>> if it is really a big problem. >>> But I doubt it can control every control block I/O as we wish since >>> the interface the cgroup supports is quite poor. >> Had a question regarding cgroup interface. I am assuming that in a system, >> one will be using other controllers as well apart from IO-controller. >> Other controllers will be using cgroup as a grouping mechanism. >> Now coming up with additional grouping mechanism for only io-controller seems >> little odd to me. It will make the job of higher level management software >> harder. >> >> Looking at the dm-ioband grouping examples given in patches, I think cases >> of grouping based in pid, pgrp, uid and kvm can be handled by creating right >> cgroup and making sure applications are launched/moved into right cgroup by >> user space tools. > > Grouping in pid, pgrp and uid is not the point, which I've been thinking > can be replaced with cgroup once the implementation of bio-cgroup is done. > > I think problems of cgroup are that they can't support lots of storages > and hotplug devices, it just handle them as if they were just one resource. Could you elaborate on this please? > I don't insist the interface of dm-ioband is the best. I just hope the > cgroup infrastructure support this kind of resources. > What sort of support will help you? -- Balbir ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080924.193414.22923673.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080924.193414.22923673.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> @ 2008-09-24 12:38 ` Balbir Singh 2008-09-24 14:53 ` Vivek Goyal 1 sibling, 0 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-24 12:38 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, fernando-gVGce1chcLdL9jVzuh4AOg, xemul-GEFAQzZX7r8dnm+yROfE0A Hirokazu Takahashi wrote: > Hi, > >>> It's possible the algorithm of dm-ioband can be placed in the block layer >>> if it is really a big problem. >>> But I doubt it can control every control block I/O as we wish since >>> the interface the cgroup supports is quite poor. >> Had a question regarding cgroup interface. I am assuming that in a system, >> one will be using other controllers as well apart from IO-controller. >> Other controllers will be using cgroup as a grouping mechanism. >> Now coming up with additional grouping mechanism for only io-controller seems >> little odd to me. It will make the job of higher level management software >> harder. >> >> Looking at the dm-ioband grouping examples given in patches, I think cases >> of grouping based in pid, pgrp, uid and kvm can be handled by creating right >> cgroup and making sure applications are launched/moved into right cgroup by >> user space tools. > > Grouping in pid, pgrp and uid is not the point, which I've been thinking > can be replaced with cgroup once the implementation of bio-cgroup is done. > > I think problems of cgroup are that they can't support lots of storages > and hotplug devices, it just handle them as if they were just one resource. Could you elaborate on this please? > I don't insist the interface of dm-ioband is the best. I just hope the > cgroup infrastructure support this kind of resources. > What sort of support will help you? -- Balbir ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080924.193414.22923673.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> 2008-09-24 12:38 ` Balbir Singh @ 2008-09-24 14:53 ` Vivek Goyal 1 sibling, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-24 14:53 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 On Wed, Sep 24, 2008 at 07:34:14PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > It's possible the algorithm of dm-ioband can be placed in the block layer > > > if it is really a big problem. > > > But I doubt it can control every control block I/O as we wish since > > > the interface the cgroup supports is quite poor. > > > > Had a question regarding cgroup interface. I am assuming that in a system, > > one will be using other controllers as well apart from IO-controller. > > Other controllers will be using cgroup as a grouping mechanism. > > Now coming up with additional grouping mechanism for only io-controller seems > > little odd to me. It will make the job of higher level management software > > harder. > > > > Looking at the dm-ioband grouping examples given in patches, I think cases > > of grouping based in pid, pgrp, uid and kvm can be handled by creating right > > cgroup and making sure applications are launched/moved into right cgroup by > > user space tools. > > Grouping in pid, pgrp and uid is not the point, which I've been thinking > can be replaced with cgroup once the implementation of bio-cgroup is done. > > I think problems of cgroup are that they can't support lots of storages > and hotplug devices, it just handle them as if they were just one resource. > I don't insist the interface of dm-ioband is the best. I just hope the > cgroup infrastructure support this kind of resources. > Sorry, I did not understand fully. Can you please explain in detail what kind of situation will not be covered by cgroup interface. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 10:34 ` Hirokazu Takahashi ` (2 preceding siblings ...) (?) @ 2008-09-24 12:38 ` Balbir Singh -1 siblings, 0 replies; 140+ messages in thread From: Balbir Singh @ 2008-09-24 12:38 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, righi.andrea, fernando, vgoyal, xemul Hirokazu Takahashi wrote: > Hi, > >>> It's possible the algorithm of dm-ioband can be placed in the block layer >>> if it is really a big problem. >>> But I doubt it can control every control block I/O as we wish since >>> the interface the cgroup supports is quite poor. >> Had a question regarding cgroup interface. I am assuming that in a system, >> one will be using other controllers as well apart from IO-controller. >> Other controllers will be using cgroup as a grouping mechanism. >> Now coming up with additional grouping mechanism for only io-controller seems >> little odd to me. It will make the job of higher level management software >> harder. >> >> Looking at the dm-ioband grouping examples given in patches, I think cases >> of grouping based in pid, pgrp, uid and kvm can be handled by creating right >> cgroup and making sure applications are launched/moved into right cgroup by >> user space tools. > > Grouping in pid, pgrp and uid is not the point, which I've been thinking > can be replaced with cgroup once the implementation of bio-cgroup is done. > > I think problems of cgroup are that they can't support lots of storages > and hotplug devices, it just handle them as if they were just one resource. Could you elaborate on this please? > I don't insist the interface of dm-ioband is the best. I just hope the > cgroup infrastructure support this kind of resources. > What sort of support will help you? -- Balbir ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 10:34 ` Hirokazu Takahashi ` (3 preceding siblings ...) (?) @ 2008-09-24 14:53 ` Vivek Goyal -1 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-24 14:53 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir On Wed, Sep 24, 2008 at 07:34:14PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > It's possible the algorithm of dm-ioband can be placed in the block layer > > > if it is really a big problem. > > > But I doubt it can control every control block I/O as we wish since > > > the interface the cgroup supports is quite poor. > > > > Had a question regarding cgroup interface. I am assuming that in a system, > > one will be using other controllers as well apart from IO-controller. > > Other controllers will be using cgroup as a grouping mechanism. > > Now coming up with additional grouping mechanism for only io-controller seems > > little odd to me. It will make the job of higher level management software > > harder. > > > > Looking at the dm-ioband grouping examples given in patches, I think cases > > of grouping based in pid, pgrp, uid and kvm can be handled by creating right > > cgroup and making sure applications are launched/moved into right cgroup by > > user space tools. > > Grouping in pid, pgrp and uid is not the point, which I've been thinking > can be replaced with cgroup once the implementation of bio-cgroup is done. > > I think problems of cgroup are that they can't support lots of storages > and hotplug devices, it just handle them as if they were just one resource. > I don't insist the interface of dm-ioband is the best. I just hope the > cgroup infrastructure support this kind of resources. > Sorry, I did not understand fully. Can you please explain in detail what kind of situation will not be covered by cgroup interface. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 10:34 ` Hirokazu Takahashi @ 2008-09-24 14:53 ` Vivek Goyal -1 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-24 14:53 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir On Wed, Sep 24, 2008 at 07:34:14PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > It's possible the algorithm of dm-ioband can be placed in the block layer > > > if it is really a big problem. > > > But I doubt it can control every control block I/O as we wish since > > > the interface the cgroup supports is quite poor. > > > > Had a question regarding cgroup interface. I am assuming that in a system, > > one will be using other controllers as well apart from IO-controller. > > Other controllers will be using cgroup as a grouping mechanism. > > Now coming up with additional grouping mechanism for only io-controller seems > > little odd to me. It will make the job of higher level management software > > harder. > > > > Looking at the dm-ioband grouping examples given in patches, I think cases > > of grouping based in pid, pgrp, uid and kvm can be handled by creating right > > cgroup and making sure applications are launched/moved into right cgroup by > > user space tools. > > Grouping in pid, pgrp and uid is not the point, which I've been thinking > can be replaced with cgroup once the implementation of bio-cgroup is done. > > I think problems of cgroup are that they can't support lots of storages > and hotplug devices, it just handle them as if they were just one resource. > I don't insist the interface of dm-ioband is the best. I just hope the > cgroup infrastructure support this kind of resources. > Sorry, I did not understand fully. Can you please explain in detail what kind of situation will not be covered by cgroup interface. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-24 14:53 ` Vivek Goyal 0 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-24 14:53 UTC (permalink / raw) To: Hirokazu Takahashi Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe On Wed, Sep 24, 2008 at 07:34:14PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > It's possible the algorithm of dm-ioband can be placed in the block layer > > > if it is really a big problem. > > > But I doubt it can control every control block I/O as we wish since > > > the interface the cgroup supports is quite poor. > > > > Had a question regarding cgroup interface. I am assuming that in a system, > > one will be using other controllers as well apart from IO-controller. > > Other controllers will be using cgroup as a grouping mechanism. > > Now coming up with additional grouping mechanism for only io-controller seems > > little odd to me. It will make the job of higher level management software > > harder. > > > > Looking at the dm-ioband grouping examples given in patches, I think cases > > of grouping based in pid, pgrp, uid and kvm can be handled by creating right > > cgroup and making sure applications are launched/moved into right cgroup by > > user space tools. > > Grouping in pid, pgrp and uid is not the point, which I've been thinking > can be replaced with cgroup once the implementation of bio-cgroup is done. > > I think problems of cgroup are that they can't support lots of storages > and hotplug devices, it just handle them as if they were just one resource. > I don't insist the interface of dm-ioband is the best. I just hope the > cgroup infrastructure support this kind of resources. > Sorry, I did not understand fully. Can you please explain in detail what kind of situation will not be covered by cgroup interface. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080924145331.GD547-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080924145331.GD547-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-09-26 13:04 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-26 13:04 UTC (permalink / raw) To: vgoyal-H+wXaHxf7aLQT0dZR+AlfA Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi, > > > > It's possible the algorithm of dm-ioband can be placed in the block layer > > > > if it is really a big problem. > > > > But I doubt it can control every control block I/O as we wish since > > > > the interface the cgroup supports is quite poor. > > > > > > Had a question regarding cgroup interface. I am assuming that in a system, > > > one will be using other controllers as well apart from IO-controller. > > > Other controllers will be using cgroup as a grouping mechanism. > > > Now coming up with additional grouping mechanism for only io-controller seems > > > little odd to me. It will make the job of higher level management software > > > harder. > > > > > > Looking at the dm-ioband grouping examples given in patches, I think cases > > > of grouping based in pid, pgrp, uid and kvm can be handled by creating right > > > cgroup and making sure applications are launched/moved into right cgroup by > > > user space tools. > > > > Grouping in pid, pgrp and uid is not the point, which I've been thinking > > can be replaced with cgroup once the implementation of bio-cgroup is done. > > > > I think problems of cgroup are that they can't support lots of storages > > and hotplug devices, it just handle them as if they were just one resource. > > I don't insist the interface of dm-ioband is the best. I just hope the > > cgroup infrastructure support this kind of resources. > > > > Sorry, I did not understand fully. Can you please explain in detail what > kind of situation will not be covered by cgroup interface. From the concept of the cgroup, if you want control several disks independently, you should make each disk have its own cgroup subsystem, which only can be defined when compiling the kernel. This is impossible because every linux box has various number of disks. So you think it may be possible to make each cgroup have lots of control files for each device as a workaround. But it isn't allowed to add/remove control files when some devices are hot-added or hot-removed. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 14:53 ` Vivek Goyal @ 2008-09-26 13:04 ` Hirokazu Takahashi -1 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-26 13:04 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, ryov, xemul, fernando, balbir Hi, > > > > It's possible the algorithm of dm-ioband can be placed in the block layer > > > > if it is really a big problem. > > > > But I doubt it can control every control block I/O as we wish since > > > > the interface the cgroup supports is quite poor. > > > > > > Had a question regarding cgroup interface. I am assuming that in a system, > > > one will be using other controllers as well apart from IO-controller. > > > Other controllers will be using cgroup as a grouping mechanism. > > > Now coming up with additional grouping mechanism for only io-controller seems > > > little odd to me. It will make the job of higher level management software > > > harder. > > > > > > Looking at the dm-ioband grouping examples given in patches, I think cases > > > of grouping based in pid, pgrp, uid and kvm can be handled by creating right > > > cgroup and making sure applications are launched/moved into right cgroup by > > > user space tools. > > > > Grouping in pid, pgrp and uid is not the point, which I've been thinking > > can be replaced with cgroup once the implementation of bio-cgroup is done. > > > > I think problems of cgroup are that they can't support lots of storages > > and hotplug devices, it just handle them as if they were just one resource. > > I don't insist the interface of dm-ioband is the best. I just hope the > > cgroup infrastructure support this kind of resources. > > > > Sorry, I did not understand fully. Can you please explain in detail what > kind of situation will not be covered by cgroup interface. From the concept of the cgroup, if you want control several disks independently, you should make each disk have its own cgroup subsystem, which only can be defined when compiling the kernel. This is impossible because every linux box has various number of disks. So you think it may be possible to make each cgroup have lots of control files for each device as a workaround. But it isn't allowed to add/remove control files when some devices are hot-added or hot-removed. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-26 13:04 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-26 13:04 UTC (permalink / raw) To: vgoyal Cc: ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, righi.andrea, jens.axboe Hi, > > > > It's possible the algorithm of dm-ioband can be placed in the block layer > > > > if it is really a big problem. > > > > But I doubt it can control every control block I/O as we wish since > > > > the interface the cgroup supports is quite poor. > > > > > > Had a question regarding cgroup interface. I am assuming that in a system, > > > one will be using other controllers as well apart from IO-controller. > > > Other controllers will be using cgroup as a grouping mechanism. > > > Now coming up with additional grouping mechanism for only io-controller seems > > > little odd to me. It will make the job of higher level management software > > > harder. > > > > > > Looking at the dm-ioband grouping examples given in patches, I think cases > > > of grouping based in pid, pgrp, uid and kvm can be handled by creating right > > > cgroup and making sure applications are launched/moved into right cgroup by > > > user space tools. > > > > Grouping in pid, pgrp and uid is not the point, which I've been thinking > > can be replaced with cgroup once the implementation of bio-cgroup is done. > > > > I think problems of cgroup are that they can't support lots of storages > > and hotplug devices, it just handle them as if they were just one resource. > > I don't insist the interface of dm-ioband is the best. I just hope the > > cgroup infrastructure support this kind of resources. > > > > Sorry, I did not understand fully. Can you please explain in detail what > kind of situation will not be covered by cgroup interface. >From the concept of the cgroup, if you want control several disks independently, you should make each disk have its own cgroup subsystem, which only can be defined when compiling the kernel. This is impossible because every linux box has various number of disks. So you think it may be possible to make each cgroup have lots of control files for each device as a workaround. But it isn't allowed to add/remove control files when some devices are hot-added or hot-removed. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-26 13:04 ` Hirokazu Takahashi @ 2008-09-26 15:56 ` Andrea Righi -1 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 15:56 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, vgoyal, balbir Hirokazu Takahashi wrote: > Hi, > >>>>> It's possible the algorithm of dm-ioband can be placed in the block layer >>>>> if it is really a big problem. >>>>> But I doubt it can control every control block I/O as we wish since >>>>> the interface the cgroup supports is quite poor. >>>> Had a question regarding cgroup interface. I am assuming that in a system, >>>> one will be using other controllers as well apart from IO-controller. >>>> Other controllers will be using cgroup as a grouping mechanism. >>>> Now coming up with additional grouping mechanism for only io-controller seems >>>> little odd to me. It will make the job of higher level management software >>>> harder. >>>> >>>> Looking at the dm-ioband grouping examples given in patches, I think cases >>>> of grouping based in pid, pgrp, uid and kvm can be handled by creating right >>>> cgroup and making sure applications are launched/moved into right cgroup by >>>> user space tools. >>> Grouping in pid, pgrp and uid is not the point, which I've been thinking >>> can be replaced with cgroup once the implementation of bio-cgroup is done. >>> >>> I think problems of cgroup are that they can't support lots of storages >>> and hotplug devices, it just handle them as if they were just one resource. >>> I don't insist the interface of dm-ioband is the best. I just hope the >>> cgroup infrastructure support this kind of resources. >>> >> Sorry, I did not understand fully. Can you please explain in detail what >> kind of situation will not be covered by cgroup interface. > > From the concept of the cgroup, if you want control several disks > independently, you should make each disk have its own cgroup subsystem, > which only can be defined when compiling the kernel. This is impossible > because every linux box has various number of disks. mmh? not true. You can define a single cgroup subsystem that implements the opportune interfaces to apply your type of control, and use many structures allocated dynamically for each controlled object (one for each block device, disk, partition, ... or using any kind of grouping/splitting policy). Actually, this is how cgroup-io-throttle, as well as any other cgroup subsystem, is implemented. > So you think it may be possible to make each cgroup have lots of control > files for each device as a workaround. But it isn't allowed to add/remove > control files when some devices are hot-added or hot-removed. Why not a single control file for all the devices? -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-26 15:56 ` Andrea Righi 0 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 15:56 UTC (permalink / raw) To: Hirokazu Takahashi Cc: vgoyal, ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe Hirokazu Takahashi wrote: > Hi, > >>>>> It's possible the algorithm of dm-ioband can be placed in the block layer >>>>> if it is really a big problem. >>>>> But I doubt it can control every control block I/O as we wish since >>>>> the interface the cgroup supports is quite poor. >>>> Had a question regarding cgroup interface. I am assuming that in a system, >>>> one will be using other controllers as well apart from IO-controller. >>>> Other controllers will be using cgroup as a grouping mechanism. >>>> Now coming up with additional grouping mechanism for only io-controller seems >>>> little odd to me. It will make the job of higher level management software >>>> harder. >>>> >>>> Looking at the dm-ioband grouping examples given in patches, I think cases >>>> of grouping based in pid, pgrp, uid and kvm can be handled by creating right >>>> cgroup and making sure applications are launched/moved into right cgroup by >>>> user space tools. >>> Grouping in pid, pgrp and uid is not the point, which I've been thinking >>> can be replaced with cgroup once the implementation of bio-cgroup is done. >>> >>> I think problems of cgroup are that they can't support lots of storages >>> and hotplug devices, it just handle them as if they were just one resource. >>> I don't insist the interface of dm-ioband is the best. I just hope the >>> cgroup infrastructure support this kind of resources. >>> >> Sorry, I did not understand fully. Can you please explain in detail what >> kind of situation will not be covered by cgroup interface. > > From the concept of the cgroup, if you want control several disks > independently, you should make each disk have its own cgroup subsystem, > which only can be defined when compiling the kernel. This is impossible > because every linux box has various number of disks. mmh? not true. You can define a single cgroup subsystem that implements the opportune interfaces to apply your type of control, and use many structures allocated dynamically for each controlled object (one for each block device, disk, partition, ... or using any kind of grouping/splitting policy). Actually, this is how cgroup-io-throttle, as well as any other cgroup subsystem, is implemented. > So you think it may be possible to make each cgroup have lots of control > files for each device as a workaround. But it isn't allowed to add/remove > control files when some devices are hot-added or hot-removed. Why not a single control file for all the devices? -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-26 15:56 ` Andrea Righi @ 2008-09-29 10:40 ` Hirokazu Takahashi -1 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-29 10:40 UTC (permalink / raw) To: righi.andrea Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, ryov, xemul, fernando, vgoyal, balbir Hi, > >>>>> It's possible the algorithm of dm-ioband can be placed in the block layer > >>>>> if it is really a big problem. > >>>>> But I doubt it can control every control block I/O as we wish since > >>>>> the interface the cgroup supports is quite poor. > >>>> Had a question regarding cgroup interface. I am assuming that in a system, > >>>> one will be using other controllers as well apart from IO-controller. > >>>> Other controllers will be using cgroup as a grouping mechanism. > >>>> Now coming up with additional grouping mechanism for only io-controller seems > >>>> little odd to me. It will make the job of higher level management software > >>>> harder. > >>>> > >>>> Looking at the dm-ioband grouping examples given in patches, I think cases > >>>> of grouping based in pid, pgrp, uid and kvm can be handled by creating right > >>>> cgroup and making sure applications are launched/moved into right cgroup by > >>>> user space tools. > >>> Grouping in pid, pgrp and uid is not the point, which I've been thinking > >>> can be replaced with cgroup once the implementation of bio-cgroup is done. > >>> > >>> I think problems of cgroup are that they can't support lots of storages > >>> and hotplug devices, it just handle them as if they were just one resource. > >>> I don't insist the interface of dm-ioband is the best. I just hope the > >>> cgroup infrastructure support this kind of resources. > >>> > >> Sorry, I did not understand fully. Can you please explain in detail what > >> kind of situation will not be covered by cgroup interface. > > > > From the concept of the cgroup, if you want control several disks > > independently, you should make each disk have its own cgroup subsystem, > > which only can be defined when compiling the kernel. This is impossible > > because every linux box has various number of disks. > > mmh? not true. You can define a single cgroup subsystem that implements > the opportune interfaces to apply your type of control, and use many > structures allocated dynamically for each controlled object (one for > each block device, disk, partition, ... or using any kind of > grouping/splitting policy). Actually, this is how cgroup-io-throttle, as > well as any other cgroup subsystem, is implemented. > > > So you think it may be possible to make each cgroup have lots of control > > files for each device as a workaround. But it isn't allowed to add/remove > > control files when some devices are hot-added or hot-removed. > > Why not a single control file for all the devices? This is possible but I wonder if this is really the way we should go. It looks like you tried implementing another ioctl-like interface on the cgroup control file interface. You can do anything you want with this interface though. I guess there should be at least some rules to implement this kind of ioctl-like interface if they don't want to enhance the cgroup interface, Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-29 10:40 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-29 10:40 UTC (permalink / raw) To: righi.andrea Cc: vgoyal, ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe Hi, > >>>>> It's possible the algorithm of dm-ioband can be placed in the block layer > >>>>> if it is really a big problem. > >>>>> But I doubt it can control every control block I/O as we wish since > >>>>> the interface the cgroup supports is quite poor. > >>>> Had a question regarding cgroup interface. I am assuming that in a system, > >>>> one will be using other controllers as well apart from IO-controller. > >>>> Other controllers will be using cgroup as a grouping mechanism. > >>>> Now coming up with additional grouping mechanism for only io-controller seems > >>>> little odd to me. It will make the job of higher level management software > >>>> harder. > >>>> > >>>> Looking at the dm-ioband grouping examples given in patches, I think cases > >>>> of grouping based in pid, pgrp, uid and kvm can be handled by creating right > >>>> cgroup and making sure applications are launched/moved into right cgroup by > >>>> user space tools. > >>> Grouping in pid, pgrp and uid is not the point, which I've been thinking > >>> can be replaced with cgroup once the implementation of bio-cgroup is done. > >>> > >>> I think problems of cgroup are that they can't support lots of storages > >>> and hotplug devices, it just handle them as if they were just one resource. > >>> I don't insist the interface of dm-ioband is the best. I just hope the > >>> cgroup infrastructure support this kind of resources. > >>> > >> Sorry, I did not understand fully. Can you please explain in detail what > >> kind of situation will not be covered by cgroup interface. > > > > From the concept of the cgroup, if you want control several disks > > independently, you should make each disk have its own cgroup subsystem, > > which only can be defined when compiling the kernel. This is impossible > > because every linux box has various number of disks. > > mmh? not true. You can define a single cgroup subsystem that implements > the opportune interfaces to apply your type of control, and use many > structures allocated dynamically for each controlled object (one for > each block device, disk, partition, ... or using any kind of > grouping/splitting policy). Actually, this is how cgroup-io-throttle, as > well as any other cgroup subsystem, is implemented. > > > So you think it may be possible to make each cgroup have lots of control > > files for each device as a workaround. But it isn't allowed to add/remove > > control files when some devices are hot-added or hot-removed. > > Why not a single control file for all the devices? This is possible but I wonder if this is really the way we should go. It looks like you tried implementing another ioctl-like interface on the cgroup control file interface. You can do anything you want with this interface though. I guess there should be at least some rules to implement this kind of ioctl-like interface if they don't want to enhance the cgroup interface, Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-26 15:56 ` Andrea Righi (?) (?) @ 2008-09-29 10:40 ` Hirokazu Takahashi -1 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-29 10:40 UTC (permalink / raw) To: righi.andrea Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, vgoyal, balbir Hi, > >>>>> It's possible the algorithm of dm-ioband can be placed in the block layer > >>>>> if it is really a big problem. > >>>>> But I doubt it can control every control block I/O as we wish since > >>>>> the interface the cgroup supports is quite poor. > >>>> Had a question regarding cgroup interface. I am assuming that in a system, > >>>> one will be using other controllers as well apart from IO-controller. > >>>> Other controllers will be using cgroup as a grouping mechanism. > >>>> Now coming up with additional grouping mechanism for only io-controller seems > >>>> little odd to me. It will make the job of higher level management software > >>>> harder. > >>>> > >>>> Looking at the dm-ioband grouping examples given in patches, I think cases > >>>> of grouping based in pid, pgrp, uid and kvm can be handled by creating right > >>>> cgroup and making sure applications are launched/moved into right cgroup by > >>>> user space tools. > >>> Grouping in pid, pgrp and uid is not the point, which I've been thinking > >>> can be replaced with cgroup once the implementation of bio-cgroup is done. > >>> > >>> I think problems of cgroup are that they can't support lots of storages > >>> and hotplug devices, it just handle them as if they were just one resource. > >>> I don't insist the interface of dm-ioband is the best. I just hope the > >>> cgroup infrastructure support this kind of resources. > >>> > >> Sorry, I did not understand fully. Can you please explain in detail what > >> kind of situation will not be covered by cgroup interface. > > > > From the concept of the cgroup, if you want control several disks > > independently, you should make each disk have its own cgroup subsystem, > > which only can be defined when compiling the kernel. This is impossible > > because every linux box has various number of disks. > > mmh? not true. You can define a single cgroup subsystem that implements > the opportune interfaces to apply your type of control, and use many > structures allocated dynamically for each controlled object (one for > each block device, disk, partition, ... or using any kind of > grouping/splitting policy). Actually, this is how cgroup-io-throttle, as > well as any other cgroup subsystem, is implemented. > > > So you think it may be possible to make each cgroup have lots of control > > files for each device as a workaround. But it isn't allowed to add/remove > > control files when some devices are hot-added or hot-removed. > > Why not a single control file for all the devices? This is possible but I wonder if this is really the way we should go. It looks like you tried implementing another ioctl-like interface on the cgroup control file interface. You can do anything you want with this interface though. I guess there should be at least some rules to implement this kind of ioctl-like interface if they don't want to enhance the cgroup interface, Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <48DD0617.3050403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <48DD0617.3050403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2008-09-29 10:40 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-29 10:40 UTC (permalink / raw) To: righi.andrea-Re5JQEeQqe8AvxtiuMwx3w Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi, > >>>>> It's possible the algorithm of dm-ioband can be placed in the block layer > >>>>> if it is really a big problem. > >>>>> But I doubt it can control every control block I/O as we wish since > >>>>> the interface the cgroup supports is quite poor. > >>>> Had a question regarding cgroup interface. I am assuming that in a system, > >>>> one will be using other controllers as well apart from IO-controller. > >>>> Other controllers will be using cgroup as a grouping mechanism. > >>>> Now coming up with additional grouping mechanism for only io-controller seems > >>>> little odd to me. It will make the job of higher level management software > >>>> harder. > >>>> > >>>> Looking at the dm-ioband grouping examples given in patches, I think cases > >>>> of grouping based in pid, pgrp, uid and kvm can be handled by creating right > >>>> cgroup and making sure applications are launched/moved into right cgroup by > >>>> user space tools. > >>> Grouping in pid, pgrp and uid is not the point, which I've been thinking > >>> can be replaced with cgroup once the implementation of bio-cgroup is done. > >>> > >>> I think problems of cgroup are that they can't support lots of storages > >>> and hotplug devices, it just handle them as if they were just one resource. > >>> I don't insist the interface of dm-ioband is the best. I just hope the > >>> cgroup infrastructure support this kind of resources. > >>> > >> Sorry, I did not understand fully. Can you please explain in detail what > >> kind of situation will not be covered by cgroup interface. > > > > From the concept of the cgroup, if you want control several disks > > independently, you should make each disk have its own cgroup subsystem, > > which only can be defined when compiling the kernel. This is impossible > > because every linux box has various number of disks. > > mmh? not true. You can define a single cgroup subsystem that implements > the opportune interfaces to apply your type of control, and use many > structures allocated dynamically for each controlled object (one for > each block device, disk, partition, ... or using any kind of > grouping/splitting policy). Actually, this is how cgroup-io-throttle, as > well as any other cgroup subsystem, is implemented. > > > So you think it may be possible to make each cgroup have lots of control > > files for each device as a workaround. But it isn't allowed to add/remove > > control files when some devices are hot-added or hot-removed. > > Why not a single control file for all the devices? This is possible but I wonder if this is really the way we should go. It looks like you tried implementing another ioctl-like interface on the cgroup control file interface. You can do anything you want with this interface though. I guess there should be at least some rules to implement this kind of ioctl-like interface if they don't want to enhance the cgroup interface, Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080926.220418.83079316.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080926.220418.83079316.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> @ 2008-09-26 15:56 ` Andrea Righi 0 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 15:56 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hirokazu Takahashi wrote: > Hi, > >>>>> It's possible the algorithm of dm-ioband can be placed in the block layer >>>>> if it is really a big problem. >>>>> But I doubt it can control every control block I/O as we wish since >>>>> the interface the cgroup supports is quite poor. >>>> Had a question regarding cgroup interface. I am assuming that in a system, >>>> one will be using other controllers as well apart from IO-controller. >>>> Other controllers will be using cgroup as a grouping mechanism. >>>> Now coming up with additional grouping mechanism for only io-controller seems >>>> little odd to me. It will make the job of higher level management software >>>> harder. >>>> >>>> Looking at the dm-ioband grouping examples given in patches, I think cases >>>> of grouping based in pid, pgrp, uid and kvm can be handled by creating right >>>> cgroup and making sure applications are launched/moved into right cgroup by >>>> user space tools. >>> Grouping in pid, pgrp and uid is not the point, which I've been thinking >>> can be replaced with cgroup once the implementation of bio-cgroup is done. >>> >>> I think problems of cgroup are that they can't support lots of storages >>> and hotplug devices, it just handle them as if they were just one resource. >>> I don't insist the interface of dm-ioband is the best. I just hope the >>> cgroup infrastructure support this kind of resources. >>> >> Sorry, I did not understand fully. Can you please explain in detail what >> kind of situation will not be covered by cgroup interface. > > From the concept of the cgroup, if you want control several disks > independently, you should make each disk have its own cgroup subsystem, > which only can be defined when compiling the kernel. This is impossible > because every linux box has various number of disks. mmh? not true. You can define a single cgroup subsystem that implements the opportune interfaces to apply your type of control, and use many structures allocated dynamically for each controlled object (one for each block device, disk, partition, ... or using any kind of grouping/splitting policy). Actually, this is how cgroup-io-throttle, as well as any other cgroup subsystem, is implemented. > So you think it may be possible to make each cgroup have lots of control > files for each device as a workaround. But it isn't allowed to add/remove > control files when some devices are hot-added or hot-removed. Why not a single control file for all the devices? -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-26 13:04 ` Hirokazu Takahashi ` (2 preceding siblings ...) (?) @ 2008-09-26 15:56 ` Andrea Righi -1 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-26 15:56 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, vgoyal, balbir Hirokazu Takahashi wrote: > Hi, > >>>>> It's possible the algorithm of dm-ioband can be placed in the block layer >>>>> if it is really a big problem. >>>>> But I doubt it can control every control block I/O as we wish since >>>>> the interface the cgroup supports is quite poor. >>>> Had a question regarding cgroup interface. I am assuming that in a system, >>>> one will be using other controllers as well apart from IO-controller. >>>> Other controllers will be using cgroup as a grouping mechanism. >>>> Now coming up with additional grouping mechanism for only io-controller seems >>>> little odd to me. It will make the job of higher level management software >>>> harder. >>>> >>>> Looking at the dm-ioband grouping examples given in patches, I think cases >>>> of grouping based in pid, pgrp, uid and kvm can be handled by creating right >>>> cgroup and making sure applications are launched/moved into right cgroup by >>>> user space tools. >>> Grouping in pid, pgrp and uid is not the point, which I've been thinking >>> can be replaced with cgroup once the implementation of bio-cgroup is done. >>> >>> I think problems of cgroup are that they can't support lots of storages >>> and hotplug devices, it just handle them as if they were just one resource. >>> I don't insist the interface of dm-ioband is the best. I just hope the >>> cgroup infrastructure support this kind of resources. >>> >> Sorry, I did not understand fully. Can you please explain in detail what >> kind of situation will not be covered by cgroup interface. > > From the concept of the cgroup, if you want control several disks > independently, you should make each disk have its own cgroup subsystem, > which only can be defined when compiling the kernel. This is impossible > because every linux box has various number of disks. mmh? not true. You can define a single cgroup subsystem that implements the opportune interfaces to apply your type of control, and use many structures allocated dynamically for each controlled object (one for each block device, disk, partition, ... or using any kind of grouping/splitting policy). Actually, this is how cgroup-io-throttle, as well as any other cgroup subsystem, is implemented. > So you think it may be possible to make each cgroup have lots of control > files for each device as a workaround. But it isn't allowed to add/remove > control files when some devices are hot-added or hot-removed. Why not a single control file for all the devices? -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-24 14:53 ` Vivek Goyal ` (2 preceding siblings ...) (?) @ 2008-09-26 13:04 ` Hirokazu Takahashi -1 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-26 13:04 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir Hi, > > > > It's possible the algorithm of dm-ioband can be placed in the block layer > > > > if it is really a big problem. > > > > But I doubt it can control every control block I/O as we wish since > > > > the interface the cgroup supports is quite poor. > > > > > > Had a question regarding cgroup interface. I am assuming that in a system, > > > one will be using other controllers as well apart from IO-controller. > > > Other controllers will be using cgroup as a grouping mechanism. > > > Now coming up with additional grouping mechanism for only io-controller seems > > > little odd to me. It will make the job of higher level management software > > > harder. > > > > > > Looking at the dm-ioband grouping examples given in patches, I think cases > > > of grouping based in pid, pgrp, uid and kvm can be handled by creating right > > > cgroup and making sure applications are launched/moved into right cgroup by > > > user space tools. > > > > Grouping in pid, pgrp and uid is not the point, which I've been thinking > > can be replaced with cgroup once the implementation of bio-cgroup is done. > > > > I think problems of cgroup are that they can't support lots of storages > > and hotplug devices, it just handle them as if they were just one resource. > > I don't insist the interface of dm-ioband is the best. I just hope the > > cgroup infrastructure support this kind of resources. > > > > Sorry, I did not understand fully. Can you please explain in detail what > kind of situation will not be covered by cgroup interface. From the concept of the cgroup, if you want control several disks independently, you should make each disk have its own cgroup subsystem, which only can be defined when compiling the kernel. This is impossible because every linux box has various number of disks. So you think it may be possible to make each cgroup have lots of control files for each device as a workaround. But it isn't allowed to add/remove control files when some devices are hot-added or hot-removed. Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 13:10 ` Vivek Goyal 2008-09-19 20:28 ` Andrea Righi 2008-09-22 9:36 ` Hirokazu Takahashi @ 2008-09-22 9:36 ` Hirokazu Takahashi [not found] ` <20080919131019.GA3606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 3 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-22 9:36 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir Hi, > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > such as seek time. > > > > > > > > You can see the details of the benchmarks at: > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > (snip) > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > every device we want to control using rules. This looks little odd > > > atleast to me. Can't we keep it in line with rest of the controllers > > > where task grouping takes place using cgroup and rules are specified in > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > I've been also considering that this algorithm itself can be implemented > > in the block layer directly. > > > > Although, the current implementation has merits. It is flexible. > > - Dm-ioband can be place anywhere you like, which may be right before > > the I/O schedulers or may be placed on top of LVM devices. > > Hi, > > An rb-tree per request queue also should be able to give us this > flexibility. Because logic is implemented per request queue, rules can be > placed at any layer. Either at bottom most layer where requests are > passed to elevator or at higher layer where requests will be passed to > lower level block devices in the stack. Just that we shall have to do > modifications to some of the higher level dm/md drivers to make use of > queuing cgroup requests and releasing cgroup requests to lower layers. Request descriptors are allocated just right before passing I/O requests to the elevators. Even if you move the descriptor allocation point before calling the dm/md drivers, the drivers can't make use of them. When one of the dm drivers accepts a I/O request, the request won't have either a real device number or a real sector number. The request will be re-mapped to another sector of another device in every dm drivers. The request may even be replicated there. So it is really hard to find the right request queue to put the request into and sort them on the queue. > > - It supports partition based bandwidth control which can work without > > cgroups, which is quite easy to use of. > > > - It is independent to any I/O schedulers including ones which will > > be introduced in the future. > > This scheme should also be independent of any of the IO schedulers. We > might have to do small changes in IO-schedulers to decouple the things > from __make_request() a bit to insert rb-tree in between __make_request() > and IO-scheduler. Otherwise fundamentally, this approach should not > require any major modifications to IO-schedulers. > > > > > I also understand it's will be hard to set up without some tools > > such as lvm commands. > > > > That's something I wish to avoid. If we can keep it simple by doing > grouping using cgroup and allow one line rules in cgroup it would be nice. It's possible the algorithm of dm-ioband can be placed in the block layer if it is really a big problem. But I doubt it can control every control block I/O as we wish since the interface the cgroup supports is quite poor. > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > device we want to subject to rules, I was thinking of maintaining an > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > __make_request() and then will filter down to elevator associated with the > > > queue (if there is one). This will provide us the control of releasing > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > etc) and no need of stacking additional block device. > > > > I think it's a bit late to control I/O requests there, since process > > may be blocked in get_request_wait when the I/O load is high. > > Please imagine the situation that cgroups with low bandwidths are > > consuming most of "struct request"s while another cgroup with a high > > bandwidth is blocked and can't get enough "struct request"s. > > > > It means cgroups that issues lot of I/O request can win the game. > > > > Ok, this is a good point. Because number of struct requests are limited > and they seem to be allocated on first come first serve basis, so if a > cgroup is generating lot of IO, then it might win. > > But dm-ioband will face the same issue. Nope. Dm-ioband doesn't have this issue since it works before allocating the descriptors. Only I/O requests dm-ioband has passed can allocate its descriptor. > Essentially it is also a request > queue and it will have limited number of request descriptors. Have you > modified the logic somewhere for allocation of request descriptors to the > waiting processes based on their weights? If yes, the logic probably can > be implemented here too. I feel this is almost what dm-ioband is doing. > Thanks > Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080919131019.GA3606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 13:10 ` Vivek Goyal @ 2008-09-19 20:28 ` Andrea Righi 2008-09-22 9:36 ` Hirokazu Takahashi ` (2 subsequent siblings) 3 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-19 20:28 UTC (permalink / raw) To: Vivek Goyal Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Vivek Goyal wrote: > On Fri, Sep 19, 2008 at 08:20:31PM +0900, Hirokazu Takahashi wrote: >>> To avoid creation of stacking another device (dm-ioband) on top of every >>> device we want to subject to rules, I was thinking of maintaining an >>> rb-tree per request queue. Requests will first go into this rb-tree upon >>> __make_request() and then will filter down to elevator associated with the >>> queue (if there is one). This will provide us the control of releasing >>> bio's to elevaor based on policies (proportional weight, max bandwidth >>> etc) and no need of stacking additional block device. >> I think it's a bit late to control I/O requests there, since process >> may be blocked in get_request_wait when the I/O load is high. >> Please imagine the situation that cgroups with low bandwidths are >> consuming most of "struct request"s while another cgroup with a high >> bandwidth is blocked and can't get enough "struct request"s. >> >> It means cgroups that issues lot of I/O request can win the game. >> > > Ok, this is a good point. Because number of struct requests are limited > and they seem to be allocated on first come first serve basis, so if a > cgroup is generating lot of IO, then it might win. > > But dm-ioband will face the same issue. Essentially it is also a request > queue and it will have limited number of request descriptors. Have you > modified the logic somewhere for allocation of request descriptors to the > waiting processes based on their weights? If yes, the logic probably can > be implemented here too. Maybe throttling dirty page ratio in memory could help to avoid this problem. I mean, if a cgroup is exceeding the i/o limits do ehm... something.. also at the balance_dirty_pages() level. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-19 20:28 ` Andrea Righi 0 siblings, 0 replies; 140+ messages in thread From: Andrea Righi @ 2008-09-19 20:28 UTC (permalink / raw) To: Vivek Goyal Cc: Hirokazu Takahashi, ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe Vivek Goyal wrote: > On Fri, Sep 19, 2008 at 08:20:31PM +0900, Hirokazu Takahashi wrote: >>> To avoid creation of stacking another device (dm-ioband) on top of every >>> device we want to subject to rules, I was thinking of maintaining an >>> rb-tree per request queue. Requests will first go into this rb-tree upon >>> __make_request() and then will filter down to elevator associated with the >>> queue (if there is one). This will provide us the control of releasing >>> bio's to elevaor based on policies (proportional weight, max bandwidth >>> etc) and no need of stacking additional block device. >> I think it's a bit late to control I/O requests there, since process >> may be blocked in get_request_wait when the I/O load is high. >> Please imagine the situation that cgroups with low bandwidths are >> consuming most of "struct request"s while another cgroup with a high >> bandwidth is blocked and can't get enough "struct request"s. >> >> It means cgroups that issues lot of I/O request can win the game. >> > > Ok, this is a good point. Because number of struct requests are limited > and they seem to be allocated on first come first serve basis, so if a > cgroup is generating lot of IO, then it might win. > > But dm-ioband will face the same issue. Essentially it is also a request > queue and it will have limited number of request descriptors. Have you > modified the logic somewhere for allocation of request descriptors to the > waiting processes based on their weights? If yes, the logic probably can > be implemented here too. Maybe throttling dirty page ratio in memory could help to avoid this problem. I mean, if a cgroup is exceeding the i/o limits do ehm... something.. also at the balance_dirty_pages() level. -Andrea ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 20:28 ` Andrea Righi @ 2008-09-22 9:45 ` Hirokazu Takahashi -1 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-22 9:45 UTC (permalink / raw) To: righi.andrea Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, ryov, xemul, fernando, vgoyal, balbir Hi, > >>> To avoid creation of stacking another device (dm-ioband) on top of every > >>> device we want to subject to rules, I was thinking of maintaining an > >>> rb-tree per request queue. Requests will first go into this rb-tree upon > >>> __make_request() and then will filter down to elevator associated with the > >>> queue (if there is one). This will provide us the control of releasing > >>> bio's to elevaor based on policies (proportional weight, max bandwidth > >>> etc) and no need of stacking additional block device. > >> I think it's a bit late to control I/O requests there, since process > >> may be blocked in get_request_wait when the I/O load is high. > >> Please imagine the situation that cgroups with low bandwidths are > >> consuming most of "struct request"s while another cgroup with a high > >> bandwidth is blocked and can't get enough "struct request"s. > >> > >> It means cgroups that issues lot of I/O request can win the game. > >> > > > > Ok, this is a good point. Because number of struct requests are limited > > and they seem to be allocated on first come first serve basis, so if a > > cgroup is generating lot of IO, then it might win. > > > > But dm-ioband will face the same issue. Essentially it is also a request > > queue and it will have limited number of request descriptors. Have you > > modified the logic somewhere for allocation of request descriptors to the > > waiting processes based on their weights? If yes, the logic probably can > > be implemented here too. > > Maybe throttling dirty page ratio in memory could help to avoid this problem. > I mean, if a cgroup is exceeding the i/o limits do ehm... something.. also at > the balance_dirty_pages() level. That is one of the important features to be implemented for controlling I/O. The dirty page ratio controlling can help to avoid this issue but it isn't guaranteed. So, both of them should be implemented. What would you think happens in cases that some cgroups may have tons of threads which issue a lot of direct I/Os, or others may have huge memory? Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-22 9:45 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-22 9:45 UTC (permalink / raw) To: righi.andrea Cc: vgoyal, ryov, linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, agk, jens.axboe Hi, > >>> To avoid creation of stacking another device (dm-ioband) on top of every > >>> device we want to subject to rules, I was thinking of maintaining an > >>> rb-tree per request queue. Requests will first go into this rb-tree upon > >>> __make_request() and then will filter down to elevator associated with the > >>> queue (if there is one). This will provide us the control of releasing > >>> bio's to elevaor based on policies (proportional weight, max bandwidth > >>> etc) and no need of stacking additional block device. > >> I think it's a bit late to control I/O requests there, since process > >> may be blocked in get_request_wait when the I/O load is high. > >> Please imagine the situation that cgroups with low bandwidths are > >> consuming most of "struct request"s while another cgroup with a high > >> bandwidth is blocked and can't get enough "struct request"s. > >> > >> It means cgroups that issues lot of I/O request can win the game. > >> > > > > Ok, this is a good point. Because number of struct requests are limited > > and they seem to be allocated on first come first serve basis, so if a > > cgroup is generating lot of IO, then it might win. > > > > But dm-ioband will face the same issue. Essentially it is also a request > > queue and it will have limited number of request descriptors. Have you > > modified the logic somewhere for allocation of request descriptors to the > > waiting processes based on their weights? If yes, the logic probably can > > be implemented here too. > > Maybe throttling dirty page ratio in memory could help to avoid this problem. > I mean, if a cgroup is exceeding the i/o limits do ehm... something.. also at > the balance_dirty_pages() level. That is one of the important features to be implemented for controlling I/O. The dirty page ratio controlling can help to avoid this issue but it isn't guaranteed. So, both of them should be implemented. What would you think happens in cases that some cgroups may have tons of threads which issue a lot of direct I/Os, or others may have huge memory? Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 20:28 ` Andrea Righi (?) (?) @ 2008-09-22 9:45 ` Hirokazu Takahashi -1 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-22 9:45 UTC (permalink / raw) To: righi.andrea Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, vgoyal, balbir Hi, > >>> To avoid creation of stacking another device (dm-ioband) on top of every > >>> device we want to subject to rules, I was thinking of maintaining an > >>> rb-tree per request queue. Requests will first go into this rb-tree upon > >>> __make_request() and then will filter down to elevator associated with the > >>> queue (if there is one). This will provide us the control of releasing > >>> bio's to elevaor based on policies (proportional weight, max bandwidth > >>> etc) and no need of stacking additional block device. > >> I think it's a bit late to control I/O requests there, since process > >> may be blocked in get_request_wait when the I/O load is high. > >> Please imagine the situation that cgroups with low bandwidths are > >> consuming most of "struct request"s while another cgroup with a high > >> bandwidth is blocked and can't get enough "struct request"s. > >> > >> It means cgroups that issues lot of I/O request can win the game. > >> > > > > Ok, this is a good point. Because number of struct requests are limited > > and they seem to be allocated on first come first serve basis, so if a > > cgroup is generating lot of IO, then it might win. > > > > But dm-ioband will face the same issue. Essentially it is also a request > > queue and it will have limited number of request descriptors. Have you > > modified the logic somewhere for allocation of request descriptors to the > > waiting processes based on their weights? If yes, the logic probably can > > be implemented here too. > > Maybe throttling dirty page ratio in memory could help to avoid this problem. > I mean, if a cgroup is exceeding the i/o limits do ehm... something.. also at > the balance_dirty_pages() level. That is one of the important features to be implemented for controlling I/O. The dirty page ratio controlling can help to avoid this issue but it isn't guaranteed. So, both of them should be implemented. What would you think happens in cases that some cgroups may have tons of threads which issue a lot of direct I/Os, or others may have huge memory? Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <48D40B78.6060709-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <48D40B78.6060709-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2008-09-22 9:45 ` Hirokazu Takahashi 0 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-22 9:45 UTC (permalink / raw) To: righi.andrea-Re5JQEeQqe8AvxtiuMwx3w Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi, > >>> To avoid creation of stacking another device (dm-ioband) on top of every > >>> device we want to subject to rules, I was thinking of maintaining an > >>> rb-tree per request queue. Requests will first go into this rb-tree upon > >>> __make_request() and then will filter down to elevator associated with the > >>> queue (if there is one). This will provide us the control of releasing > >>> bio's to elevaor based on policies (proportional weight, max bandwidth > >>> etc) and no need of stacking additional block device. > >> I think it's a bit late to control I/O requests there, since process > >> may be blocked in get_request_wait when the I/O load is high. > >> Please imagine the situation that cgroups with low bandwidths are > >> consuming most of "struct request"s while another cgroup with a high > >> bandwidth is blocked and can't get enough "struct request"s. > >> > >> It means cgroups that issues lot of I/O request can win the game. > >> > > > > Ok, this is a good point. Because number of struct requests are limited > > and they seem to be allocated on first come first serve basis, so if a > > cgroup is generating lot of IO, then it might win. > > > > But dm-ioband will face the same issue. Essentially it is also a request > > queue and it will have limited number of request descriptors. Have you > > modified the logic somewhere for allocation of request descriptors to the > > waiting processes based on their weights? If yes, the logic probably can > > be implemented here too. > > Maybe throttling dirty page ratio in memory could help to avoid this problem. > I mean, if a cgroup is exceeding the i/o limits do ehm... something.. also at > the balance_dirty_pages() level. That is one of the important features to be implemented for controlling I/O. The dirty page ratio controlling can help to avoid this issue but it isn't guaranteed. So, both of them should be implemented. What would you think happens in cases that some cgroups may have tons of threads which issue a lot of direct I/Os, or others may have huge memory? Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080919131019.GA3606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2008-09-19 20:28 ` Andrea Righi @ 2008-09-22 9:36 ` Hirokazu Takahashi 1 sibling, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-22 9:36 UTC (permalink / raw) To: vgoyal-H+wXaHxf7aLQT0dZR+AlfA Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi, > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > > bandwidth even when it accepts delayed write requests. > > > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > > reason was to avoid a performance bottleneck due to mechanical factors > > > > such as seek time. > > > > > > > > You can see the details of the benchmarks at: > > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > > > (snip) > > > > > Secondly, why do we have to create an additional dm-ioband device for > > > every device we want to control using rules. This looks little odd > > > atleast to me. Can't we keep it in line with rest of the controllers > > > where task grouping takes place using cgroup and rules are specified in > > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > > > It isn't essential dm-band is implemented as one of the device-mappers. > > I've been also considering that this algorithm itself can be implemented > > in the block layer directly. > > > > Although, the current implementation has merits. It is flexible. > > - Dm-ioband can be place anywhere you like, which may be right before > > the I/O schedulers or may be placed on top of LVM devices. > > Hi, > > An rb-tree per request queue also should be able to give us this > flexibility. Because logic is implemented per request queue, rules can be > placed at any layer. Either at bottom most layer where requests are > passed to elevator or at higher layer where requests will be passed to > lower level block devices in the stack. Just that we shall have to do > modifications to some of the higher level dm/md drivers to make use of > queuing cgroup requests and releasing cgroup requests to lower layers. Request descriptors are allocated just right before passing I/O requests to the elevators. Even if you move the descriptor allocation point before calling the dm/md drivers, the drivers can't make use of them. When one of the dm drivers accepts a I/O request, the request won't have either a real device number or a real sector number. The request will be re-mapped to another sector of another device in every dm drivers. The request may even be replicated there. So it is really hard to find the right request queue to put the request into and sort them on the queue. > > - It supports partition based bandwidth control which can work without > > cgroups, which is quite easy to use of. > > > - It is independent to any I/O schedulers including ones which will > > be introduced in the future. > > This scheme should also be independent of any of the IO schedulers. We > might have to do small changes in IO-schedulers to decouple the things > from __make_request() a bit to insert rb-tree in between __make_request() > and IO-scheduler. Otherwise fundamentally, this approach should not > require any major modifications to IO-schedulers. > > > > > I also understand it's will be hard to set up without some tools > > such as lvm commands. > > > > That's something I wish to avoid. If we can keep it simple by doing > grouping using cgroup and allow one line rules in cgroup it would be nice. It's possible the algorithm of dm-ioband can be placed in the block layer if it is really a big problem. But I doubt it can control every control block I/O as we wish since the interface the cgroup supports is quite poor. > > > To avoid creation of stacking another device (dm-ioband) on top of every > > > device we want to subject to rules, I was thinking of maintaining an > > > rb-tree per request queue. Requests will first go into this rb-tree upon > > > __make_request() and then will filter down to elevator associated with the > > > queue (if there is one). This will provide us the control of releasing > > > bio's to elevaor based on policies (proportional weight, max bandwidth > > > etc) and no need of stacking additional block device. > > > > I think it's a bit late to control I/O requests there, since process > > may be blocked in get_request_wait when the I/O load is high. > > Please imagine the situation that cgroups with low bandwidths are > > consuming most of "struct request"s while another cgroup with a high > > bandwidth is blocked and can't get enough "struct request"s. > > > > It means cgroups that issues lot of I/O request can win the game. > > > > Ok, this is a good point. Because number of struct requests are limited > and they seem to be allocated on first come first serve basis, so if a > cgroup is generating lot of IO, then it might win. > > But dm-ioband will face the same issue. Nope. Dm-ioband doesn't have this issue since it works before allocating the descriptors. Only I/O requests dm-ioband has passed can allocate its descriptor. > Essentially it is also a request > queue and it will have limited number of request descriptors. Have you > modified the logic somewhere for allocation of request descriptors to the > waiting processes based on their weights? If yes, the logic probably can > be implemented here too. I feel this is almost what dm-ioband is doing. > Thanks > Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 11:20 ` Hirokazu Takahashi ` (2 preceding siblings ...) (?) @ 2008-09-19 13:10 ` Vivek Goyal -1 siblings, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-19 13:10 UTC (permalink / raw) To: Hirokazu Takahashi Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir On Fri, Sep 19, 2008 at 08:20:31PM +0900, Hirokazu Takahashi wrote: > Hi, > > > > Hi All, > > > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > > bandwidth even when it accepts delayed write requests. > > > > > > In this time, I ran some benchmarks with a high-end storage. The > > > reason was to avoid a performance bottleneck due to mechanical factors > > > such as seek time. > > > > > > You can see the details of the benchmarks at: > > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > > (snip) > > > Secondly, why do we have to create an additional dm-ioband device for > > every device we want to control using rules. This looks little odd > > atleast to me. Can't we keep it in line with rest of the controllers > > where task grouping takes place using cgroup and rules are specified in > > cgroup itself (The way Andrea Righi does for io-throttling patches)? > > It isn't essential dm-band is implemented as one of the device-mappers. > I've been also considering that this algorithm itself can be implemented > in the block layer directly. > > Although, the current implementation has merits. It is flexible. > - Dm-ioband can be place anywhere you like, which may be right before > the I/O schedulers or may be placed on top of LVM devices. Hi, An rb-tree per request queue also should be able to give us this flexibility. Because logic is implemented per request queue, rules can be placed at any layer. Either at bottom most layer where requests are passed to elevator or at higher layer where requests will be passed to lower level block devices in the stack. Just that we shall have to do modifications to some of the higher level dm/md drivers to make use of queuing cgroup requests and releasing cgroup requests to lower layers. > - It supports partition based bandwidth control which can work without > cgroups, which is quite easy to use of. > - It is independent to any I/O schedulers including ones which will > be introduced in the future. This scheme should also be independent of any of the IO schedulers. We might have to do small changes in IO-schedulers to decouple the things from __make_request() a bit to insert rb-tree in between __make_request() and IO-scheduler. Otherwise fundamentally, this approach should not require any major modifications to IO-schedulers. > > I also understand it's will be hard to set up without some tools > such as lvm commands. > That's something I wish to avoid. If we can keep it simple by doing grouping using cgroup and allow one line rules in cgroup it would be nice. > > To avoid creation of stacking another device (dm-ioband) on top of every > > device we want to subject to rules, I was thinking of maintaining an > > rb-tree per request queue. Requests will first go into this rb-tree upon > > __make_request() and then will filter down to elevator associated with the > > queue (if there is one). This will provide us the control of releasing > > bio's to elevaor based on policies (proportional weight, max bandwidth > > etc) and no need of stacking additional block device. > > I think it's a bit late to control I/O requests there, since process > may be blocked in get_request_wait when the I/O load is high. > Please imagine the situation that cgroups with low bandwidths are > consuming most of "struct request"s while another cgroup with a high > bandwidth is blocked and can't get enough "struct request"s. > > It means cgroups that issues lot of I/O request can win the game. > Ok, this is a good point. Because number of struct requests are limited and they seem to be allocated on first come first serve basis, so if a cgroup is generating lot of IO, then it might win. But dm-ioband will face the same issue. Essentially it is also a request queue and it will have limited number of request descriptors. Have you modified the logic somewhere for allocation of request descriptors to the waiting processes based on their weights? If yes, the logic probably can be implemented here too. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 13:15 ` Vivek Goyal ` (5 preceding siblings ...) 2008-09-19 11:20 ` Hirokazu Takahashi @ 2008-09-19 11:20 ` Hirokazu Takahashi 6 siblings, 0 replies; 140+ messages in thread From: Hirokazu Takahashi @ 2008-09-19 11:20 UTC (permalink / raw) To: vgoyal Cc: xen-devel, containers, jens.axboe, linux-kernel, virtualization, dm-devel, righi.andrea, agk, xemul, fernando, balbir Hi, > > Hi All, > > > > I have got excellent results of dm-ioband, that controls the disk I/O > > bandwidth even when it accepts delayed write requests. > > > > In this time, I ran some benchmarks with a high-end storage. The > > reason was to avoid a performance bottleneck due to mechanical factors > > such as seek time. > > > > You can see the details of the benchmarks at: > > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ (snip) > Secondly, why do we have to create an additional dm-ioband device for > every device we want to control using rules. This looks little odd > atleast to me. Can't we keep it in line with rest of the controllers > where task grouping takes place using cgroup and rules are specified in > cgroup itself (The way Andrea Righi does for io-throttling patches)? It isn't essential dm-band is implemented as one of the device-mappers. I've been also considering that this algorithm itself can be implemented in the block layer directly. Although, the current implementation has merits. It is flexible. - Dm-ioband can be place anywhere you like, which may be right before the I/O schedulers or may be placed on top of LVM devices. - It supports partition based bandwidth control which can work without cgroups, which is quite easy to use of. - It is independent to any I/O schedulers including ones which will be introduced in the future. I also understand it's will be hard to set up without some tools such as lvm commands. > To avoid creation of stacking another device (dm-ioband) on top of every > device we want to subject to rules, I was thinking of maintaining an > rb-tree per request queue. Requests will first go into this rb-tree upon > __make_request() and then will filter down to elevator associated with the > queue (if there is one). This will provide us the control of releasing > bio's to elevaor based on policies (proportional weight, max bandwidth > etc) and no need of stacking additional block device. I think it's a bit late to control I/O requests there, since process may be blocked in get_request_wait when the I/O load is high. Please imagine the situation that cgroups with low bandwidths are consuming most of "struct request"s while another cgroup with a high bandwidth is blocked and can't get enough "struct request"s. It means cgroups that issues lot of I/O request can win the game. > I am working on some experimental proof of concept patches. It will take > some time though. > > I was thinking of following. > > - Adopt the Andrea Righi's style of specifying rules for devices and > group the tasks using cgroups. > > - To begin with, adopt dm-ioband's approach of proportional bandwidth > controller. It makes sense to me limit the bandwidth usage only in > case of contention. If there is really a need to limit max bandwidth, > then probably we can do something to implement additional rules or > implement some policy switcher where user can decide what kind of > policies need to be implemented. > > - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > request queue which is controlled by some kind of cgroup rules. > > It would be good to discuss above approach now whether it makes sense or > not. I think it is kind of fusion of io-throttling and dm-ioband patches > with additional idea of doing io-control just above elevator on the request > queue using an rb-tree. > > Thanks > Vivek > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <20080918.210418.226794540.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080918.210418.226794540.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> @ 2008-09-18 13:15 ` Vivek Goyal 2008-09-19 8:49 ` Takuya Yoshikawa 1 sibling, 0 replies; 140+ messages in thread From: Vivek Goyal @ 2008-09-18 13:15 UTC (permalink / raw) To: Ryo Tsuruta Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Andrea Righi, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > Hi All, > > I have got excellent results of dm-ioband, that controls the disk I/O > bandwidth even when it accepts delayed write requests. > > In this time, I ran some benchmarks with a high-end storage. The > reason was to avoid a performance bottleneck due to mechanical factors > such as seek time. > > You can see the details of the benchmarks at: > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > Hi Ryo, I had a query about dm-ioband patches. IIUC, dm-ioband patches will break the notion of process priority in CFQ because now dm-ioband device will hold the bio and issue these to lower layers later based on which bio's become ready. Hence actual bio submitting context might be different and because cfq derives the io_context from current task, it will be broken. To mitigate that problem, we probably need to implement Fernando's suggestion of putting io_context pointer in bio. Have you already done something to solve this issue? Secondly, why do we have to create an additional dm-ioband device for every device we want to control using rules. This looks little odd atleast to me. Can't we keep it in line with rest of the controllers where task grouping takes place using cgroup and rules are specified in cgroup itself (The way Andrea Righi does for io-throttling patches)? To avoid creation of stacking another device (dm-ioband) on top of every device we want to subject to rules, I was thinking of maintaining an rb-tree per request queue. Requests will first go into this rb-tree upon __make_request() and then will filter down to elevator associated with the queue (if there is one). This will provide us the control of releasing bio's to elevaor based on policies (proportional weight, max bandwidth etc) and no need of stacking additional block device. I am working on some experimental proof of concept patches. It will take some time though. I was thinking of following. - Adopt the Andrea Righi's style of specifying rules for devices and group the tasks using cgroups. - To begin with, adopt dm-ioband's approach of proportional bandwidth controller. It makes sense to me limit the bandwidth usage only in case of contention. If there is really a need to limit max bandwidth, then probably we can do something to implement additional rules or implement some policy switcher where user can decide what kind of policies need to be implemented. - Get rid of dm-ioband and instead buffer requests on an rb-tree on every request queue which is controlled by some kind of cgroup rules. It would be good to discuss above approach now whether it makes sense or not. I think it is kind of fusion of io-throttling and dm-ioband patches with additional idea of doing io-control just above elevator on the request queue using an rb-tree. Thanks Vivek ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <20080918.210418.226794540.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> 2008-09-18 13:15 ` Vivek Goyal @ 2008-09-19 8:49 ` Takuya Yoshikawa 1 sibling, 0 replies; 140+ messages in thread From: Takuya Yoshikawa @ 2008-09-19 8:49 UTC (permalink / raw) To: Ryo Tsuruta Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi Tsuruta-san, Ryo Tsuruta wrote: > Hi All, > > I have got excellent results of dm-ioband, that controls the disk I/O > bandwidth even when it accepts delayed write requests. > > In this time, I ran some benchmarks with a high-end storage. The > reason was to avoid a performance bottleneck due to mechanical factors > such as seek time. > > You can see the details of the benchmarks at: > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > I took a look at your beautiful results! When you have time, would you explain me how you succeeded to check the time, bandwidth, especially when you did write() tests? Actually, I tried similar tests and failed to check the bandwidth correctly. Did you insert something in the kernel source? Thanks, Takuya Yoshikawa ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 12:04 dm-ioband + bio-cgroup benchmarks Ryo Tsuruta ` (2 preceding siblings ...) [not found] ` <20080918.210418.226794540.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org> @ 2008-09-19 8:49 ` Takuya Yoshikawa 2008-09-19 8:49 ` Takuya Yoshikawa 4 siblings, 0 replies; 140+ messages in thread From: Takuya Yoshikawa @ 2008-09-19 8:49 UTC (permalink / raw) To: Ryo Tsuruta Cc: xen-devel, containers, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, kamezawa.hiroyu, balbir Hi Tsuruta-san, Ryo Tsuruta wrote: > Hi All, > > I have got excellent results of dm-ioband, that controls the disk I/O > bandwidth even when it accepts delayed write requests. > > In this time, I ran some benchmarks with a high-end storage. The > reason was to avoid a performance bottleneck due to mechanical factors > such as seek time. > > You can see the details of the benchmarks at: > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > I took a look at your beautiful results! When you have time, would you explain me how you succeeded to check the time, bandwidth, especially when you did write() tests? Actually, I tried similar tests and failed to check the bandwidth correctly. Did you insert something in the kernel source? Thanks, Takuya Yoshikawa ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-18 12:04 dm-ioband + bio-cgroup benchmarks Ryo Tsuruta ` (3 preceding siblings ...) 2008-09-19 8:49 ` Takuya Yoshikawa @ 2008-09-19 8:49 ` Takuya Yoshikawa [not found] ` <48D36794.6010002-gVGce1chcLdL9jVzuh4AOg@public.gmane.org> ` (2 more replies) 4 siblings, 3 replies; 140+ messages in thread From: Takuya Yoshikawa @ 2008-09-19 8:49 UTC (permalink / raw) To: Ryo Tsuruta Cc: linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, kamezawa.hiroyu, agk Hi Tsuruta-san, Ryo Tsuruta wrote: > Hi All, > > I have got excellent results of dm-ioband, that controls the disk I/O > bandwidth even when it accepts delayed write requests. > > In this time, I ran some benchmarks with a high-end storage. The > reason was to avoid a performance bottleneck due to mechanical factors > such as seek time. > > You can see the details of the benchmarks at: > http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > I took a look at your beautiful results! When you have time, would you explain me how you succeeded to check the time, bandwidth, especially when you did write() tests? Actually, I tried similar tests and failed to check the bandwidth correctly. Did you insert something in the kernel source? Thanks, Takuya Yoshikawa ^ permalink raw reply [flat|nested] 140+ messages in thread
[parent not found: <48D36794.6010002-gVGce1chcLdL9jVzuh4AOg@public.gmane.org>]
* Re: dm-ioband + bio-cgroup benchmarks [not found] ` <48D36794.6010002-gVGce1chcLdL9jVzuh4AOg@public.gmane.org> @ 2008-09-19 11:31 ` Ryo Tsuruta 0 siblings, 0 replies; 140+ messages in thread From: Ryo Tsuruta @ 2008-09-19 11:31 UTC (permalink / raw) To: yoshikawa.takuya-gVGce1chcLdL9jVzuh4AOg Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-9JcytcrH/bA+uJoB2kUjGw, xemul-GEFAQzZX7r8dnm+yROfE0A, fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hi Yoshikawa-san, > When you have time, would you explain me how you succeeded to check the > time, bandwidth, especially when you did write() tests? Actually, I tried > similar tests and failed to check the bandwidth correctly. Did you insert > something in the kernel source? I'm using our own tool, which issues I/Os in prallel in a specified period and counts up how many I/Os are issued and how many bytes are transferred in the period. I'm also using our own tool for measurement of throughput variation to see the internal data of dm-ioband. This tool is implemented as a kernel module. Thanks, Ryo Tsuruta ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 8:49 ` Takuya Yoshikawa [not found] ` <48D36794.6010002-gVGce1chcLdL9jVzuh4AOg@public.gmane.org> @ 2008-09-19 11:31 ` Ryo Tsuruta 2008-09-19 11:31 ` Ryo Tsuruta 2 siblings, 0 replies; 140+ messages in thread From: Ryo Tsuruta @ 2008-09-19 11:31 UTC (permalink / raw) To: yoshikawa.takuya Cc: xen-devel, containers, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, kamezawa.hiroyu, balbir Hi Yoshikawa-san, > When you have time, would you explain me how you succeeded to check the > time, bandwidth, especially when you did write() tests? Actually, I tried > similar tests and failed to check the bandwidth correctly. Did you insert > something in the kernel source? I'm using our own tool, which issues I/Os in prallel in a specified period and counts up how many I/Os are issued and how many bytes are transferred in the period. I'm also using our own tool for measurement of throughput variation to see the internal data of dm-ioband. This tool is implemented as a kernel module. Thanks, Ryo Tsuruta ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks 2008-09-19 8:49 ` Takuya Yoshikawa @ 2008-09-19 11:31 ` Ryo Tsuruta 2008-09-19 11:31 ` Ryo Tsuruta 2008-09-19 11:31 ` Ryo Tsuruta 2 siblings, 0 replies; 140+ messages in thread From: Ryo Tsuruta @ 2008-09-19 11:31 UTC (permalink / raw) To: yoshikawa.takuya Cc: xen-devel, containers, linux-kernel, virtualization, dm-devel, agk, xemul, fernando, kamezawa.hiroyu, balbir Hi Yoshikawa-san, > When you have time, would you explain me how you succeeded to check the > time, bandwidth, especially when you did write() tests? Actually, I tried > similar tests and failed to check the bandwidth correctly. Did you insert > something in the kernel source? I'm using our own tool, which issues I/Os in prallel in a specified period and counts up how many I/Os are issued and how many bytes are transferred in the period. I'm also using our own tool for measurement of throughput variation to see the internal data of dm-ioband. This tool is implemented as a kernel module. Thanks, Ryo Tsuruta ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: dm-ioband + bio-cgroup benchmarks @ 2008-09-19 11:31 ` Ryo Tsuruta 0 siblings, 0 replies; 140+ messages in thread From: Ryo Tsuruta @ 2008-09-19 11:31 UTC (permalink / raw) To: yoshikawa.takuya Cc: linux-kernel, dm-devel, containers, virtualization, xen-devel, fernando, balbir, xemul, kamezawa.hiroyu, agk Hi Yoshikawa-san, > When you have time, would you explain me how you succeeded to check the > time, bandwidth, especially when you did write() tests? Actually, I tried > similar tests and failed to check the bandwidth correctly. Did you insert > something in the kernel source? I'm using our own tool, which issues I/Os in prallel in a specified period and counts up how many I/Os are issued and how many bytes are transferred in the period. I'm also using our own tool for measurement of throughput variation to see the internal data of dm-ioband. This tool is implemented as a kernel module. Thanks, Ryo Tsuruta ^ permalink raw reply [flat|nested] 140+ messages in thread
* dm-ioband + bio-cgroup benchmarks @ 2008-09-18 12:04 Ryo Tsuruta 0 siblings, 0 replies; 140+ messages in thread From: Ryo Tsuruta @ 2008-09-18 12:04 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR Cc: fernando-gVGce1chcLdL9jVzuh4AOg, balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, xemul-GEFAQzZX7r8dnm+yROfE0A, agk-9JcytcrH/bA+uJoB2kUjGw Hi All, I have got excellent results of dm-ioband, that controls the disk I/O bandwidth even when it accepts delayed write requests. In this time, I ran some benchmarks with a high-end storage. The reason was to avoid a performance bottleneck due to mechanical factors such as seek time. You can see the details of the benchmarks at: http://people.valinux.co.jp/~ryov/dm-ioband/hps/ Thanks, Ryo Tsuruta ^ permalink raw reply [flat|nested] 140+ messages in thread
* dm-ioband + bio-cgroup benchmarks @ 2008-09-18 12:04 Ryo Tsuruta 0 siblings, 0 replies; 140+ messages in thread From: Ryo Tsuruta @ 2008-09-18 12:04 UTC (permalink / raw) To: linux-kernel, dm-devel, containers, virtualization, xen-devel Cc: fernando, balbir, xemul, kamezawa.hiroyu, agk Hi All, I have got excellent results of dm-ioband, that controls the disk I/O bandwidth even when it accepts delayed write requests. In this time, I ran some benchmarks with a high-end storage. The reason was to avoid a performance bottleneck due to mechanical factors such as seek time. You can see the details of the benchmarks at: http://people.valinux.co.jp/~ryov/dm-ioband/hps/ Thanks, Ryo Tsuruta ^ permalink raw reply [flat|nested] 140+ messages in thread
end of thread, other threads:[~2008-09-29 12:13 UTC | newest]
Thread overview: 140+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-18 12:04 dm-ioband + bio-cgroup benchmarks Ryo Tsuruta
2008-09-18 13:15 ` Vivek Goyal
2008-09-18 13:15 ` Vivek Goyal
2008-09-18 14:37 ` Andrea Righi
2008-09-18 15:06 ` Vivek Goyal
2008-09-18 15:06 ` Vivek Goyal
2008-09-18 15:18 ` Andrea Righi
[not found] ` <20080918150634.GH20640-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-09-18 15:18 ` Andrea Righi
2008-09-18 15:18 ` Andrea Righi
2008-09-18 15:18 ` Andrea Righi
[not found] ` <48D2715A.6060002-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-09-18 16:20 ` Vivek Goyal
2008-09-19 3:34 ` [dm-devel] " Hirokazu Takahashi
2008-09-18 16:20 ` Vivek Goyal
2008-09-18 16:20 ` Vivek Goyal
2008-09-18 19:54 ` Andrea Righi
2008-09-18 19:54 ` Andrea Righi
[not found] ` <20080918162010.GJ20640-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-09-18 19:54 ` Andrea Righi
2008-09-19 3:34 ` [dm-devel] " Hirokazu Takahashi
2008-09-19 3:34 ` Hirokazu Takahashi
2008-09-20 4:27 ` KAMEZAWA Hiroyuki
2008-09-20 5:18 ` Balbir Singh
[not found] ` <20080920132703.e74c8f89.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2008-09-20 5:18 ` Balbir Singh
2008-09-20 5:18 ` Balbir Singh
2008-09-20 9:25 ` KAMEZAWA Hiroyuki
[not found] ` <48D48789.8000606-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2008-09-20 9:25 ` KAMEZAWA Hiroyuki
2008-09-20 9:25 ` KAMEZAWA Hiroyuki
2008-09-20 4:27 ` KAMEZAWA Hiroyuki
2008-09-24 11:04 ` [Xen-devel] " Balbir Singh
2008-09-24 11:04 ` [Xen-devel] " Balbir Singh
[not found] ` <661de9470809240404i62300942o15337ecec335fe22-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-09-24 11:07 ` [Xen-devel] Re: [dm-devel] " Balbir Singh
2008-09-24 11:07 ` Balbir Singh
2008-09-24 11:07 ` [Xen-devel] " Balbir Singh
2008-09-26 10:54 ` Hirokazu Takahashi
2008-09-26 10:54 ` Hirokazu Takahashi
2008-09-26 10:54 ` [Xen-devel] " Hirokazu Takahashi
[not found] ` <661de9470809240407m7f50b6dav897fef3b37295bb2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-09-26 10:54 ` Hirokazu Takahashi
2008-09-24 11:07 ` Balbir Singh
[not found] ` <20080919.123405.91829935.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2008-09-20 4:27 ` KAMEZAWA Hiroyuki
2008-09-24 11:04 ` [Xen-devel] " Balbir Singh
2008-09-19 3:34 ` Hirokazu Takahashi
[not found] ` <48D267B5.20402-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-09-18 15:06 ` Vivek Goyal
2008-09-18 14:37 ` Andrea Righi
2008-09-19 6:12 ` Hirokazu Takahashi
2008-09-19 6:12 ` Hirokazu Takahashi
2008-09-19 13:12 ` Vivek Goyal
[not found] ` <20080919.151221.49666828.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2008-09-19 13:12 ` Vivek Goyal
2008-09-19 13:12 ` Vivek Goyal
2008-09-19 6:12 ` Hirokazu Takahashi
[not found] ` <20080918131554.GB20640-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-09-18 14:37 ` Andrea Righi
2008-09-19 6:12 ` Hirokazu Takahashi
2008-09-19 11:20 ` Hirokazu Takahashi
2008-09-19 11:20 ` Hirokazu Takahashi
2008-09-19 11:20 ` Hirokazu Takahashi
[not found] ` <20080919.202031.86647893.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2008-09-19 13:10 ` Vivek Goyal
2008-09-19 13:10 ` Vivek Goyal
2008-09-19 20:28 ` Andrea Righi
2008-09-22 9:36 ` Hirokazu Takahashi
2008-09-22 9:36 ` Hirokazu Takahashi
[not found] ` <20080922.183651.62951479.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2008-09-22 14:30 ` Vivek Goyal
2008-09-22 14:30 ` Vivek Goyal
2008-09-22 14:30 ` Vivek Goyal
2008-09-24 8:29 ` Hirokazu Takahashi
2008-09-24 8:29 ` Hirokazu Takahashi
2008-09-24 8:29 ` Hirokazu Takahashi
[not found] ` <20080924.172937.72827863.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2008-09-24 14:03 ` Vivek Goyal
2008-09-24 14:03 ` Vivek Goyal
2008-09-24 14:03 ` Vivek Goyal
2008-09-24 14:03 ` Vivek Goyal
[not found] ` <20080924140355.GB547-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-09-26 16:11 ` Andrea Righi
2008-09-26 16:11 ` Andrea Righi
[not found] ` <48DD09AD.2010200-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-09-26 17:11 ` Andrea Righi
2008-09-26 17:11 ` Andrea Righi
2008-09-26 17:11 ` Andrea Righi
[not found] ` <48DD17A9.9080607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-09-26 17:30 ` Andrea Righi
2008-09-29 12:07 ` Hirokazu Takahashi
2008-09-26 17:30 ` Andrea Righi
2008-09-26 17:30 ` Andrea Righi
2008-09-29 12:07 ` Hirokazu Takahashi
2008-09-29 12:07 ` Hirokazu Takahashi
2008-09-29 12:13 ` Pavel Emelyanov
[not found] ` <20080929.210729.117112710.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2008-09-29 12:13 ` Pavel Emelyanov
2008-09-29 12:13 ` Pavel Emelyanov
2008-09-29 12:07 ` Hirokazu Takahashi
2008-09-26 16:11 ` Andrea Righi
2008-09-24 10:18 ` Hirokazu Takahashi
2008-09-24 10:18 ` Hirokazu Takahashi
[not found] ` <20080924.191803.100102323.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2008-09-24 14:52 ` Vivek Goyal
2008-09-24 14:52 ` Vivek Goyal
2008-09-24 14:52 ` Vivek Goyal
2008-09-26 12:42 ` Hirokazu Takahashi
2008-09-26 12:42 ` Hirokazu Takahashi
2008-09-26 12:42 ` Hirokazu Takahashi
[not found] ` <20080924145202.GC547-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-09-26 12:42 ` Hirokazu Takahashi
2008-09-24 14:52 ` Vivek Goyal
[not found] ` <20080922143042.GA19222-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-09-24 8:29 ` Hirokazu Takahashi
2008-09-24 10:18 ` Hirokazu Takahashi
2008-09-24 10:34 ` Hirokazu Takahashi
2008-09-24 10:18 ` Hirokazu Takahashi
2008-09-24 10:34 ` Hirokazu Takahashi
2008-09-24 10:34 ` Hirokazu Takahashi
2008-09-24 10:34 ` Hirokazu Takahashi
2008-09-24 12:38 ` Balbir Singh
[not found] ` <20080924.193414.22923673.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2008-09-24 12:38 ` Balbir Singh
2008-09-24 14:53 ` Vivek Goyal
2008-09-24 12:38 ` Balbir Singh
2008-09-24 14:53 ` Vivek Goyal
2008-09-24 14:53 ` Vivek Goyal
2008-09-24 14:53 ` Vivek Goyal
[not found] ` <20080924145331.GD547-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-09-26 13:04 ` Hirokazu Takahashi
2008-09-26 13:04 ` Hirokazu Takahashi
2008-09-26 13:04 ` Hirokazu Takahashi
2008-09-26 15:56 ` Andrea Righi
2008-09-26 15:56 ` Andrea Righi
2008-09-29 10:40 ` Hirokazu Takahashi
2008-09-29 10:40 ` Hirokazu Takahashi
2008-09-29 10:40 ` Hirokazu Takahashi
[not found] ` <48DD0617.3050403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-09-29 10:40 ` Hirokazu Takahashi
[not found] ` <20080926.220418.83079316.taka-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2008-09-26 15:56 ` Andrea Righi
2008-09-26 15:56 ` Andrea Righi
2008-09-26 13:04 ` Hirokazu Takahashi
2008-09-22 9:36 ` Hirokazu Takahashi
[not found] ` <20080919131019.GA3606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-09-19 20:28 ` Andrea Righi
2008-09-19 20:28 ` Andrea Righi
2008-09-22 9:45 ` Hirokazu Takahashi
2008-09-22 9:45 ` Hirokazu Takahashi
2008-09-22 9:45 ` Hirokazu Takahashi
[not found] ` <48D40B78.6060709-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-09-22 9:45 ` Hirokazu Takahashi
2008-09-22 9:36 ` Hirokazu Takahashi
2008-09-19 13:10 ` Vivek Goyal
2008-09-19 11:20 ` Hirokazu Takahashi
[not found] ` <20080918.210418.226794540.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2008-09-18 13:15 ` Vivek Goyal
2008-09-19 8:49 ` Takuya Yoshikawa
2008-09-19 8:49 ` Takuya Yoshikawa
2008-09-19 8:49 ` Takuya Yoshikawa
[not found] ` <48D36794.6010002-gVGce1chcLdL9jVzuh4AOg@public.gmane.org>
2008-09-19 11:31 ` Ryo Tsuruta
2008-09-19 11:31 ` Ryo Tsuruta
2008-09-19 11:31 ` Ryo Tsuruta
2008-09-19 11:31 ` Ryo Tsuruta
-- strict thread matches above, loose matches on Subject: below --
2008-09-18 12:04 Ryo Tsuruta
2008-09-18 12:04 Ryo Tsuruta
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.