* Re: RFC: Attaching threads to cgroups is OK? [not found] <48AAA296.8050802@oss.ntt.co.jp> @ 2008-08-19 11:22 ` KAMEZAWA Hiroyuki [not found] ` <20080819202237.edd75933.kamezawa.hiroyu@jp.fujitsu.com> 2008-08-20 7:41 ` Hirokazu Takahashi 2 siblings, 0 replies; 23+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-08-19 11:22 UTC (permalink / raw) To: Takuya Yoshikawa Cc: menage@google.com, containers, balbir@linux.vnet.ibm.com, fernando, virtualization On Tue, 19 Aug 2008 19:38:14 +0900 Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> wrote: > Hi everyone, > > I have a question about cgroup's policy concerning the treatment of > threads. Please consider that we want to attach an application which has > some threads already to a certain cgroup. If we echo the pid of this > application to the "tasks" file connected to this cgroup the threads > belonging to this application will NOT be moved to the new group. Is it > right? If so, is it OK? > Added Paul and Balbir to CC:. I think it is OK ...means it works as designed (now, see below about future.) > I mean, in the current implementation, threads created before the > attachement of the parent process are not treated eaqually to those > created after. > > Could you tell me if you know something about the rules of attachement > of pid, or tid, to cgroups? -- what ID is OK to write to "tasks" file > and what we can expect as a result? > Any PID is ok for "tasks". IIRC, Paul proposed "procs" file, which support moving all threads of the same PIDs. This mail from Paul explains some : http://lwn.net/Articles/289930/ > > Tsuruta-san, how about your bio-cgroup's tracking concerning this? > If we want to use your tracking functions for each threads seperately, > there seems to be a problem. > ===cf. mm_get_bio_cgroup()=================== > owner > mm_struct ----> task_struct ----> bio_cgroup > ============================================= > In my understanding, the mm_struct of a thread is same as its parent's. > So, even if we attach the TIDs of some threads to different cgroups the > tracking always returns the same bio_cgroup -- its parent's group. > Do you have some policy about in which case we can use your tracking? > It's will be resitriction when io-controller reuse information of the owner of memory. But if it's very clear who issues I/O (by tracking read/write syscall), we may have chance to record the issuer of I/O to page_cgroup struct. Thanks, -Kame > Thanks, > -- Takuya Yoshikawa > _______________________________________________ > Containers mailing list > Containers@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/containers > ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20080819202237.edd75933.kamezawa.hiroyu@jp.fujitsu.com>]
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <20080819202237.edd75933.kamezawa.hiroyu@jp.fujitsu.com> @ 2008-08-19 12:27 ` Balbir Singh [not found] ` <48AABC31.7070207@linux.vnet.ibm.com> ` (2 subsequent siblings) 3 siblings, 0 replies; 23+ messages in thread From: Balbir Singh @ 2008-08-19 12:27 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: menage@google.com, fernando, containers, virtualization KAMEZAWA Hiroyuki wrote: > On Tue, 19 Aug 2008 19:38:14 +0900 > Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> wrote: > >> Hi everyone, >> >> I have a question about cgroup's policy concerning the treatment of >> threads. Please consider that we want to attach an application which has >> some threads already to a certain cgroup. If we echo the pid of this >> application to the "tasks" file connected to this cgroup the threads >> belonging to this application will NOT be moved to the new group. Is it >> right? If so, is it OK? >> > Added Paul and Balbir to CC:. > > I think it is OK ...means it works as designed > (now, see below about future.) > >> I mean, in the current implementation, threads created before the >> attachement of the parent process are not treated eaqually to those >> created after. >> >> Could you tell me if you know something about the rules of attachement >> of pid, or tid, to cgroups? -- what ID is OK to write to "tasks" file >> and what we can expect as a result? >> > > Any PID is ok for "tasks". IIRC, Paul proposed "procs" file, which support > moving all threads of the same PIDs. > This mail from Paul explains some : http://lwn.net/Articles/289930/ > Yes, this was also discussed at the containers mini-summit Please see http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/ccb5e818209af143 > >> Tsuruta-san, how about your bio-cgroup's tracking concerning this? >> If we want to use your tracking functions for each threads seperately, >> there seems to be a problem. >> ===cf. mm_get_bio_cgroup()=================== >> owner >> mm_struct ----> task_struct ----> bio_cgroup >> ============================================= >> In my understanding, the mm_struct of a thread is same as its parent's. >> So, even if we attach the TIDs of some threads to different cgroups the >> tracking always returns the same bio_cgroup -- its parent's group. >> Do you have some policy about in which case we can use your tracking? >> > It's will be resitriction when io-controller reuse information of the owner > of memory. But if it's very clear who issues I/O (by tracking read/write > syscall), we may have chance to record the issuer of I/O to page_cgroup > struct. We already do some tracking (at dirty time, IIRC) for task IO accounting. For the memory controller, tasks are virtually grouped by the mm_struct. -- Balbir ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <48AABC31.7070207@linux.vnet.ibm.com>]
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <48AABC31.7070207@linux.vnet.ibm.com> @ 2008-08-19 12:52 ` Fernando Luis Vázquez Cao 2008-08-20 5:52 ` Takuya Yoshikawa [not found] ` <1219150334.14590.12.camel@sebastian.kern.oss.ntt.co.jp> 2 siblings, 0 replies; 23+ messages in thread From: Fernando Luis Vázquez Cao @ 2008-08-19 12:52 UTC (permalink / raw) To: balbir; +Cc: menage@google.com, containers, KAMEZAWA Hiroyuki, virtualization Hi Balbir, Kamezawa-san! On Tue, 2008-08-19 at 17:57 +0530, Balbir Singh wrote: > >> Tsuruta-san, how about your bio-cgroup's tracking concerning this? > >> If we want to use your tracking functions for each threads seperately, > >> there seems to be a problem. > >> ===cf. mm_get_bio_cgroup()=================== > >> owner > >> mm_struct ----> task_struct ----> bio_cgroup > >> ============================================= > >> In my understanding, the mm_struct of a thread is same as its parent's. > >> So, even if we attach the TIDs of some threads to different cgroups the > >> tracking always returns the same bio_cgroup -- its parent's group. > >> Do you have some policy about in which case we can use your tracking? > >> > > It's will be resitriction when io-controller reuse information of the owner > > of memory. But if it's very clear who issues I/O (by tracking read/write > > syscall), we may have chance to record the issuer of I/O to page_cgroup > > struct. > > We already do some tracking (at dirty time, IIRC) for task IO accounting. For > the memory controller, tasks are virtually grouped by the mm_struct. Thank you for your comments and the links. When it comes to io-tracking such mm_struct-based grouping might not desirable. If everyone agrees, we could try to decouple bio cgroup from that memory controller-specific bits. - Fernando ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <48AABC31.7070207@linux.vnet.ibm.com> 2008-08-19 12:52 ` Fernando Luis Vázquez Cao @ 2008-08-20 5:52 ` Takuya Yoshikawa [not found] ` <1219150334.14590.12.camel@sebastian.kern.oss.ntt.co.jp> 2 siblings, 0 replies; 23+ messages in thread From: Takuya Yoshikawa @ 2008-08-20 5:52 UTC (permalink / raw) To: balbir Cc: menage@google.com, fernando, containers, KAMEZAWA Hiroyuki, virtualization Thank you for comments and links! Balbir Singh wrote: > KAMEZAWA Hiroyuki wrote: >>> I mean, in the current implementation, threads created before the >>> attachement of the parent process are not treated eaqually to those >>> created after. >>> >>> Could you tell me if you know something about the rules of attachement >>> of pid, or tid, to cgroups? -- what ID is OK to write to "tasks" file >>> and what we can expect as a result? >>> >> Any PID is ok for "tasks". IIRC, Paul proposed "procs" file, which support >> moving all threads of the same PIDs. >> This mail from Paul explains some : http://lwn.net/Articles/289930/ >> > > Yes, this was also discussed at the containers mini-summit > > Please see > http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/ccb5e818209af143 > I read the discussions and understood the reasons. >>> Tsuruta-san, how about your bio-cgroup's tracking concerning this? >>> If we want to use your tracking functions for each threads seperately, >>> there seems to be a problem. >>> ===cf. mm_get_bio_cgroup()=================== >>> owner >>> mm_struct ----> task_struct ----> bio_cgroup >>> ============================================= >>> In my understanding, the mm_struct of a thread is same as its parent's. >>> So, even if we attach the TIDs of some threads to different cgroups the >>> tracking always returns the same bio_cgroup -- its parent's group. >>> Do you have some policy about in which case we can use your tracking? >>> >> It's will be resitriction when io-controller reuse information of the owner >> of memory. But if it's very clear who issues I/O (by tracking read/write >> syscall), we may have chance to record the issuer of I/O to page_cgroup >> struct. > > We already do some tracking (at dirty time, IIRC) for task IO accounting. For > the memory controller, tasks are virtually grouped by the mm_struct. > I understand that controlling threads with their parent through the common mm_struct is useful and in some sense reasonable. But the following situation seems to me little bit confusing. TWO POSSIBLE WAYS OF CONTROL -- after the parent process was moved =================================================================== NEW group: [parent process]----> COMMON mm_struct | OLD group: [threads]---------+ Threads can be 1. grouped by OLD group 2. (virtually) grouped by mm_struct --> same as grouped by NEW group =================================================================== ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <1219150334.14590.12.camel@sebastian.kern.oss.ntt.co.jp>]
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <1219150334.14590.12.camel@sebastian.kern.oss.ntt.co.jp> @ 2008-08-20 7:12 ` Hirokazu Takahashi [not found] ` <20080820.161247.64324924.taka@valinux.co.jp> 1 sibling, 0 replies; 23+ messages in thread From: Hirokazu Takahashi @ 2008-08-20 7:12 UTC (permalink / raw) To: fernando; +Cc: menage, virtualization, containers, balbir Hi Fernando! > Hi Balbir, Kamezawa-san! > > On Tue, 2008-08-19 at 17:57 +0530, Balbir Singh wrote: > > >> Tsuruta-san, how about your bio-cgroup's tracking concerning this? > > >> If we want to use your tracking functions for each threads seperately, > > >> there seems to be a problem. > > >> ===cf. mm_get_bio_cgroup()=================== > > >> owner > > >> mm_struct ----> task_struct ----> bio_cgroup > > >> ============================================= > > >> In my understanding, the mm_struct of a thread is same as its parent's. > > >> So, even if we attach the TIDs of some threads to different cgroups the > > >> tracking always returns the same bio_cgroup -- its parent's group. > > >> Do you have some policy about in which case we can use your tracking? > > >> > > > It's will be resitriction when io-controller reuse information of the owner > > > of memory. But if it's very clear who issues I/O (by tracking read/write > > > syscall), we may have chance to record the issuer of I/O to page_cgroup > > > struct. > > > > We already do some tracking (at dirty time, IIRC) for task IO accounting. For > > the memory controller, tasks are virtually grouped by the mm_struct. > Thank you for your comments and the links. > > When it comes to io-tracking such mm_struct-based grouping might not > desirable. If everyone agrees, we could try to decouple bio cgroup from > that memory controller-specific bits. From the technical point of view, it will be possible, but I'm not sure if it is so useful. I guess it might be overkill. I have designed the io-tracking mechanism of bio-cgroup based on the memory controller because: - I wanted reuse the existing code for the first step as far as I could. And I also think it's a good policy to make things generic so other functions can use it. - bio-cgroup should be used with the cgroup memory controller to controll delayed write requests. Without the memory controller, a bio-group may eat up lots of pages and turn them dirty. I don't want to imagine what would happens if this bio-group is assigned a low priority for I/O. If bio-cgroup may cause some trouble without the memory controller, I think it won't be so useful to design a new io-tracking mechanism. - I think this kind of thread application should control its I/O requests inside of the application. I guess it seems to quite difficult to determine which thread is doing what kind of job in the application. We can just leave this issue to these type of applications, can we? I guess most of this kind of applications must have been well designed already. What do you think if we just leave it as it is and keep the code simple? Thanks, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20080820.161247.64324924.taka@valinux.co.jp>]
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <20080820.161247.64324924.taka@valinux.co.jp> @ 2008-08-20 8:43 ` KAMEZAWA Hiroyuki 2008-08-22 1:03 ` Takuya Yoshikawa 0 siblings, 1 reply; 23+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-08-20 8:43 UTC (permalink / raw) To: Hirokazu Takahashi; +Cc: containers, fernando, balbir, menage, virtualization On Wed, 20 Aug 2008 16:12:47 +0900 (JST) Hirokazu Takahashi <taka@valinux.co.jp> wrote: > - I think this kind of thread application should control its I/O requests > inside of the application. I guess it seems to quite difficult to > determine which thread is doing what kind of job in the application. > We can just leave this issue to these type of applications, can we? > I guess most of this kind of applications must have been well designed > already. > I agree. (it's better to postpone this.) But maybe cpuset (HPC) applications uses per-thread cgroup attachment. So, I can't say there is no trouble even if we ignore threads. Thanks, -Kame ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: Attaching threads to cgroups is OK? 2008-08-20 8:43 ` KAMEZAWA Hiroyuki @ 2008-08-22 1:03 ` Takuya Yoshikawa 0 siblings, 0 replies; 23+ messages in thread From: Takuya Yoshikawa @ 2008-08-22 1:03 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: containers, virtualization, taka, menage, fernando, balbir Hi Kamezawa-san, KAMEZAWA Hiroyuki wrote: > On Wed, 20 Aug 2008 16:12:47 +0900 (JST) > Hirokazu Takahashi <taka@valinux.co.jp> wrote: > >> - I think this kind of thread application should control its I/O requests >> inside of the application. I guess it seems to quite difficult to >> determine which thread is doing what kind of job in the application. >> We can just leave this issue to these type of applications, can we? >> I guess most of this kind of applications must have been well designed >> already. >> > I agree. (it's better to postpone this.) > > But maybe cpuset (HPC) applications uses per-thread cgroup attachment. > So, I can't say there is no trouble even if we ignore threads. > Thank you for details. I feel that we are near a consensus on this problem. To make clear this not only for developers but also for users what do you think about making documentation and implementing can_attach() to avoid not presumed attachments? For example, 1. reject attachments of threads by can_attach(), 2. explain about what will and will not happen if we attach threads, ... Thanks, Yoshikawa ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <20080819202237.edd75933.kamezawa.hiroyu@jp.fujitsu.com> 2008-08-19 12:27 ` Balbir Singh [not found] ` <48AABC31.7070207@linux.vnet.ibm.com> @ 2008-08-20 11:48 ` Hirokazu Takahashi [not found] ` <20080820.204832.131207708.taka@valinux.co.jp> 3 siblings, 0 replies; 23+ messages in thread From: Hirokazu Takahashi @ 2008-08-20 11:48 UTC (permalink / raw) To: kamezawa.hiroyu; +Cc: fernando, virtualization, menage, containers, balbir Hi, > > Tsuruta-san, how about your bio-cgroup's tracking concerning this? > > If we want to use your tracking functions for each threads seperately, > > there seems to be a problem. > > ===cf. mm_get_bio_cgroup()=================== > > owner > > mm_struct ----> task_struct ----> bio_cgroup > > ============================================= > > In my understanding, the mm_struct of a thread is same as its parent's. > > So, even if we attach the TIDs of some threads to different cgroups the > > tracking always returns the same bio_cgroup -- its parent's group. > > Do you have some policy about in which case we can use your tracking? > > > It's will be resitriction when io-controller reuse information of the owner > of memory. But if it's very clear who issues I/O (by tracking read/write > syscall), we may have chance to record the issuer of I/O to page_cgroup > struct. This might be slightly different topic though, I've been thinking where we should add hooks to track I/O reqeust. I think the following set of hooks is enough whether we are going to support thread based cgroup or not. Hook-1: called when allocating a page, where the memory controller already have a hoook. Hook-2: called when making a page in page-cache dirty. For anonymous pages, Hook-1 is enough to track any type of I/O request. For pages in page-cache, Hook-1 is also enough for read I/O because the I/O is issued just once right after allocting the page. For write I/O requests to pages in page-cache, Hook-1 will be okay in most cases but sometimes process in another cgroup may write the pages. In this case, Hook-2 is needed to keep accurate to track I/O requests. So, it won't be hard to make bio-cgroup accurate for I/O request tracking. I'm off untill 28th, thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20080820.204832.131207708.taka@valinux.co.jp>]
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <20080820.204832.131207708.taka@valinux.co.jp> @ 2008-08-21 3:08 ` Fernando Luis Vázquez Cao [not found] ` <1219288081.28324.30.camel@sebastian.kern.oss.ntt.co.jp> 1 sibling, 0 replies; 23+ messages in thread From: Fernando Luis Vázquez Cao @ 2008-08-21 3:08 UTC (permalink / raw) To: Hirokazu Takahashi Cc: containers, virtualization, menage, kamezawa.hiroyu, balbir On Wed, 2008-08-20 at 20:48 +0900, Hirokazu Takahashi wrote: > Hi, > > > > Tsuruta-san, how about your bio-cgroup's tracking concerning this? > > > If we want to use your tracking functions for each threads seperately, > > > there seems to be a problem. > > > ===cf. mm_get_bio_cgroup()=================== > > > owner > > > mm_struct ----> task_struct ----> bio_cgroup > > > ============================================= > > > In my understanding, the mm_struct of a thread is same as its parent's. > > > So, even if we attach the TIDs of some threads to different cgroups the > > > tracking always returns the same bio_cgroup -- its parent's group. > > > Do you have some policy about in which case we can use your tracking? > > > > > It's will be resitriction when io-controller reuse information of the owner > > of memory. But if it's very clear who issues I/O (by tracking read/write > > syscall), we may have chance to record the issuer of I/O to page_cgroup > > struct. > > This might be slightly different topic though, > I've been thinking where we should add hooks to track I/O reqeust. > I think the following set of hooks is enough whether we are going to > support thread based cgroup or not. > > Hook-1: called when allocating a page, where the memory controller > already have a hoook. > Hook-2: called when making a page in page-cache dirty. > > For anonymous pages, Hook-1 is enough to track any type of I/O request. > For pages in page-cache, Hook-1 is also enough for read I/O because > the I/O is issued just once right after allocting the page. > For write I/O requests to pages in page-cache, Hook-1 will be okay > in most cases but sometimes process in another cgroup may write > the pages. In this case, Hook-2 is needed to keep accurate to track > I/O requests. This relative simplicity is what prompted me to say that we probably should try to disentangle the io tracking functionality from the memory controller a bit more (of course we still should reuse as much as we can from it). The rationale for this is that the existing I/O scheduler would benefit from proper io tracking capabilities too, so it'd be nice if we could have them even in non-cgroup-capable kernels. As an aside, when the IO context of a certain IO operation is known (synchronous IO comes to mind) I think it should be cashed in the resulting bio so that we can do without the expensive accesses to bio_cgroup once it enters the block layer. ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <1219288081.28324.30.camel@sebastian.kern.oss.ntt.co.jp>]
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <1219288081.28324.30.camel@sebastian.kern.oss.ntt.co.jp> @ 2008-08-21 3:32 ` Balbir Singh [not found] ` <48ACE1B4.8010000@linux.vnet.ibm.com> 1 sibling, 0 replies; 23+ messages in thread From: Balbir Singh @ 2008-08-21 3:32 UTC (permalink / raw) To: Fernando Luis Vázquez Cao Cc: Hirokazu Takahashi, containers, menage, virtualization Fernando Luis Vázquez Cao wrote: > On Wed, 2008-08-20 at 20:48 +0900, Hirokazu Takahashi wrote: >> Hi, >> >>>> Tsuruta-san, how about your bio-cgroup's tracking concerning this? >>>> If we want to use your tracking functions for each threads seperately, >>>> there seems to be a problem. >>>> ===cf. mm_get_bio_cgroup()=================== >>>> owner >>>> mm_struct ----> task_struct ----> bio_cgroup >>>> ============================================= >>>> In my understanding, the mm_struct of a thread is same as its parent's. >>>> So, even if we attach the TIDs of some threads to different cgroups the >>>> tracking always returns the same bio_cgroup -- its parent's group. >>>> Do you have some policy about in which case we can use your tracking? >>>> >>> It's will be resitriction when io-controller reuse information of the owner >>> of memory. But if it's very clear who issues I/O (by tracking read/write >>> syscall), we may have chance to record the issuer of I/O to page_cgroup >>> struct. >> This might be slightly different topic though, >> I've been thinking where we should add hooks to track I/O reqeust. >> I think the following set of hooks is enough whether we are going to >> support thread based cgroup or not. >> >> Hook-1: called when allocating a page, where the memory controller >> already have a hoook. >> Hook-2: called when making a page in page-cache dirty. >> >> For anonymous pages, Hook-1 is enough to track any type of I/O request. >> For pages in page-cache, Hook-1 is also enough for read I/O because >> the I/O is issued just once right after allocting the page. >> For write I/O requests to pages in page-cache, Hook-1 will be okay >> in most cases but sometimes process in another cgroup may write >> the pages. In this case, Hook-2 is needed to keep accurate to track >> I/O requests. > > This relative simplicity is what prompted me to say that we probably > should try to disentangle the io tracking functionality from the memory > controller a bit more (of course we still should reuse as much as we can > from it). The rationale for this is that the existing I/O scheduler > would benefit from proper io tracking capabilities too, so it'd be nice > if we could have them even in non-cgroup-capable kernels. > Hook 2 referred to in the mail above exist today in the form of task IO accounting. > As an aside, when the IO context of a certain IO operation is known > (synchronous IO comes to mind) I think it should be cashed in the > resulting bio so that we can do without the expensive accesses to > bio_cgroup once it enters the block layer. Will this give you everything you need for accounting and control (from the block layer?) -- Balbir ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <48ACE1B4.8010000@linux.vnet.ibm.com>]
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <48ACE1B4.8010000@linux.vnet.ibm.com> @ 2008-08-21 5:25 ` Fernando Luis Vázquez Cao [not found] ` <1219296306.28324.82.camel@sebastian.kern.oss.ntt.co.jp> 1 sibling, 0 replies; 23+ messages in thread From: Fernando Luis Vázquez Cao @ 2008-08-21 5:25 UTC (permalink / raw) To: balbir; +Cc: Hirokazu Takahashi, containers, menage, virtualization Hi Balbir, On Thu, 2008-08-21 at 09:02 +0530, Balbir Singh wrote: > Fernando Luis Vázquez Cao wrote: > > On Wed, 2008-08-20 at 20:48 +0900, Hirokazu Takahashi wrote: > >> Hi, > >> > >>>> Tsuruta-san, how about your bio-cgroup's tracking concerning this? > >>>> If we want to use your tracking functions for each threads seperately, > >>>> there seems to be a problem. > >>>> ===cf. mm_get_bio_cgroup()=================== > >>>> owner > >>>> mm_struct ----> task_struct ----> bio_cgroup > >>>> ============================================= > >>>> In my understanding, the mm_struct of a thread is same as its parent's. > >>>> So, even if we attach the TIDs of some threads to different cgroups the > >>>> tracking always returns the same bio_cgroup -- its parent's group. > >>>> Do you have some policy about in which case we can use your tracking? > >>>> > >>> It's will be resitriction when io-controller reuse information of the owner > >>> of memory. But if it's very clear who issues I/O (by tracking read/write > >>> syscall), we may have chance to record the issuer of I/O to page_cgroup > >>> struct. > >> This might be slightly different topic though, > >> I've been thinking where we should add hooks to track I/O reqeust. > >> I think the following set of hooks is enough whether we are going to > >> support thread based cgroup or not. > >> > >> Hook-1: called when allocating a page, where the memory controller > >> already have a hoook. > >> Hook-2: called when making a page in page-cache dirty. > >> > >> For anonymous pages, Hook-1 is enough to track any type of I/O request. > >> For pages in page-cache, Hook-1 is also enough for read I/O because > >> the I/O is issued just once right after allocting the page. > >> For write I/O requests to pages in page-cache, Hook-1 will be okay > >> in most cases but sometimes process in another cgroup may write > >> the pages. In this case, Hook-2 is needed to keep accurate to track > >> I/O requests. > > > > This relative simplicity is what prompted me to say that we probably > > should try to disentangle the io tracking functionality from the memory > > controller a bit more (of course we still should reuse as much as we can > > from it). The rationale for this is that the existing I/O scheduler > > would benefit from proper io tracking capabilities too, so it'd be nice > > if we could have them even in non-cgroup-capable kernels. > > > > Hook 2 referred to in the mail above exist today in the form of task IO accounting. Yup. > > As an aside, when the IO context of a certain IO operation is known > > (synchronous IO comes to mind) I think it should be cashed in the > > resulting bio so that we can do without the expensive accesses to > > bio_cgroup once it enters the block layer. > > Will this give you everything you need for accounting and control (from the > block layer?) Well, it depends on what you are trying to achieve. Current IO schedulers such as CFQ only care about the io_context when scheduling requests. When a new request comes in CFQ assumes that it was originated in the context of the current task, which obviously does not hold true for buffered IO and aio. This problem could be solved by using bio-cgroup for IO tracking, but accessing the io context information is somewhat expensive: page->page_cgroup->bio_cgroup->io_context. If at the time of building a bio we know its io context (i.e. the context of the task or cgroup that generated that bio) I think we should store it in the bio itself, too. With this scheme, whenever the kernel needs to know the io_context of a particular block IO operation the kernel would first try to retrieve its io_context directly from the bio, and, if not available there, would resort to the slow path (accessing it through bio_cgroup). My gut feeling is that elevator-based IO resource controllers would benefit from such an approach, too. - Fernando _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <1219296306.28324.82.camel@sebastian.kern.oss.ntt.co.jp>]
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <1219296306.28324.82.camel@sebastian.kern.oss.ntt.co.jp> @ 2008-08-21 10:28 ` Balbir Singh 2008-08-22 18:55 ` Vivek Goyal [not found] ` <20080822185527.GD27964@redhat.com> 2 siblings, 0 replies; 23+ messages in thread From: Balbir Singh @ 2008-08-21 10:28 UTC (permalink / raw) To: Fernando Luis Vázquez Cao Cc: Hirokazu Takahashi, containers, menage, virtualization Fernando Luis Vázquez Cao wrote: > Hi Balbir, > > On Thu, 2008-08-21 at 09:02 +0530, Balbir Singh wrote: >> Fernando Luis Vázquez Cao wrote: >>> On Wed, 2008-08-20 at 20:48 +0900, Hirokazu Takahashi wrote: >>>> Hi, >>>> >>>>>> Tsuruta-san, how about your bio-cgroup's tracking concerning this? >>>>>> If we want to use your tracking functions for each threads seperately, >>>>>> there seems to be a problem. >>>>>> ===cf. mm_get_bio_cgroup()=================== >>>>>> owner >>>>>> mm_struct ----> task_struct ----> bio_cgroup >>>>>> ============================================= >>>>>> In my understanding, the mm_struct of a thread is same as its parent's. >>>>>> So, even if we attach the TIDs of some threads to different cgroups the >>>>>> tracking always returns the same bio_cgroup -- its parent's group. >>>>>> Do you have some policy about in which case we can use your tracking? >>>>>> >>>>> It's will be resitriction when io-controller reuse information of the owner >>>>> of memory. But if it's very clear who issues I/O (by tracking read/write >>>>> syscall), we may have chance to record the issuer of I/O to page_cgroup >>>>> struct. >>>> This might be slightly different topic though, >>>> I've been thinking where we should add hooks to track I/O reqeust. >>>> I think the following set of hooks is enough whether we are going to >>>> support thread based cgroup or not. >>>> >>>> Hook-1: called when allocating a page, where the memory controller >>>> already have a hoook. >>>> Hook-2: called when making a page in page-cache dirty. >>>> >>>> For anonymous pages, Hook-1 is enough to track any type of I/O request. >>>> For pages in page-cache, Hook-1 is also enough for read I/O because >>>> the I/O is issued just once right after allocting the page. >>>> For write I/O requests to pages in page-cache, Hook-1 will be okay >>>> in most cases but sometimes process in another cgroup may write >>>> the pages. In this case, Hook-2 is needed to keep accurate to track >>>> I/O requests. >>> This relative simplicity is what prompted me to say that we probably >>> should try to disentangle the io tracking functionality from the memory >>> controller a bit more (of course we still should reuse as much as we can >>> from it). The rationale for this is that the existing I/O scheduler >>> would benefit from proper io tracking capabilities too, so it'd be nice >>> if we could have them even in non-cgroup-capable kernels. >>> >> Hook 2 referred to in the mail above exist today in the form of task IO accounting. > Yup. > >>> As an aside, when the IO context of a certain IO operation is known >>> (synchronous IO comes to mind) I think it should be cashed in the >>> resulting bio so that we can do without the expensive accesses to >>> bio_cgroup once it enters the block layer. >> Will this give you everything you need for accounting and control (from the >> block layer?) > > Well, it depends on what you are trying to achieve. > > Current IO schedulers such as CFQ only care about the io_context when > scheduling requests. When a new request comes in CFQ assumes that it was > originated in the context of the current task, which obviously does not > hold true for buffered IO and aio. This problem could be solved by using > bio-cgroup for IO tracking, but accessing the io context information is > somewhat expensive: > > page->page_cgroup->bio_cgroup->io_context. > > If at the time of building a bio we know its io context (i.e. the > context of the task or cgroup that generated that bio) I think we should > store it in the bio itself, too. With this scheme, whenever the kernel > needs to know the io_context of a particular block IO operation the > kernel would first try to retrieve its io_context directly from the bio, > and, if not available there, would resort to the slow path (accessing it > through bio_cgroup). My gut feeling is that elevator-based IO resource > controllers would benefit from such an approach, too. > > - Fernando OK, that seems to make sense. Thanks for explaining. -- Balbir _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <1219296306.28324.82.camel@sebastian.kern.oss.ntt.co.jp> 2008-08-21 10:28 ` Balbir Singh @ 2008-08-22 18:55 ` Vivek Goyal [not found] ` <20080822185527.GD27964@redhat.com> 2 siblings, 0 replies; 23+ messages in thread From: Vivek Goyal @ 2008-08-22 18:55 UTC (permalink / raw) To: Fernando Luis Vázquez Cao; +Cc: containers, virtualization, menage, balbir On Thu, Aug 21, 2008 at 02:25:06PM +0900, Fernando Luis Vázquez Cao wrote: > Hi Balbir, > > On Thu, 2008-08-21 at 09:02 +0530, Balbir Singh wrote: > > Fernando Luis Vázquez Cao wrote: > > > On Wed, 2008-08-20 at 20:48 +0900, Hirokazu Takahashi wrote: > > >> Hi, > > >> > > >>>> Tsuruta-san, how about your bio-cgroup's tracking concerning this? > > >>>> If we want to use your tracking functions for each threads seperately, > > >>>> there seems to be a problem. > > >>>> ===cf. mm_get_bio_cgroup()=================== > > >>>> owner > > >>>> mm_struct ----> task_struct ----> bio_cgroup > > >>>> ============================================= > > >>>> In my understanding, the mm_struct of a thread is same as its parent's. > > >>>> So, even if we attach the TIDs of some threads to different cgroups the > > >>>> tracking always returns the same bio_cgroup -- its parent's group. > > >>>> Do you have some policy about in which case we can use your tracking? > > >>>> > > >>> It's will be resitriction when io-controller reuse information of the owner > > >>> of memory. But if it's very clear who issues I/O (by tracking read/write > > >>> syscall), we may have chance to record the issuer of I/O to page_cgroup > > >>> struct. > > >> This might be slightly different topic though, > > >> I've been thinking where we should add hooks to track I/O reqeust. > > >> I think the following set of hooks is enough whether we are going to > > >> support thread based cgroup or not. > > >> > > >> Hook-1: called when allocating a page, where the memory controller > > >> already have a hoook. > > >> Hook-2: called when making a page in page-cache dirty. > > >> > > >> For anonymous pages, Hook-1 is enough to track any type of I/O request. > > >> For pages in page-cache, Hook-1 is also enough for read I/O because > > >> the I/O is issued just once right after allocting the page. > > >> For write I/O requests to pages in page-cache, Hook-1 will be okay > > >> in most cases but sometimes process in another cgroup may write > > >> the pages. In this case, Hook-2 is needed to keep accurate to track > > >> I/O requests. > > > > > > This relative simplicity is what prompted me to say that we probably > > > should try to disentangle the io tracking functionality from the memory > > > controller a bit more (of course we still should reuse as much as we can > > > from it). The rationale for this is that the existing I/O scheduler > > > would benefit from proper io tracking capabilities too, so it'd be nice > > > if we could have them even in non-cgroup-capable kernels. > > > > > > > Hook 2 referred to in the mail above exist today in the form of task IO accounting. > Yup. > > > > As an aside, when the IO context of a certain IO operation is known > > > (synchronous IO comes to mind) I think it should be cashed in the > > > resulting bio so that we can do without the expensive accesses to > > > bio_cgroup once it enters the block layer. > > > > Will this give you everything you need for accounting and control (from the > > block layer?) > > Well, it depends on what you are trying to achieve. > > Current IO schedulers such as CFQ only care about the io_context when > scheduling requests. When a new request comes in CFQ assumes that it was > originated in the context of the current task, which obviously does not > hold true for buffered IO and aio. This problem could be solved by using > bio-cgroup for IO tracking, but accessing the io context information is > somewhat expensive: > > page->page_cgroup->bio_cgroup->io_context. > > If at the time of building a bio we know its io context (i.e. the > context of the task or cgroup that generated that bio) I think we should > store it in the bio itself, too. With this scheme, whenever the kernel > needs to know the io_context of a particular block IO operation the > kernel would first try to retrieve its io_context directly from the bio, > and, if not available there, would resort to the slow path (accessing it > through bio_cgroup). My gut feeling is that elevator-based IO resource > controllers would benefit from such an approach, too. > Hi Fernando, Had a question. IIUC, at the time of submtting the bio, io_context will be known only for synchronous request. For asynchronous request it will not be known (ex. writing the dirty pages back to disk) and one shall have to take the longer path (bio-cgroup thing) to ascertain the io_context associated with a request. If that's the case, than it looks like we shall have to always traverse the longer path in case of asynchronous IO. By putting the io_context pointer in bio, we will just shift the time of pointer traversal. (From CFQ to higher layers). So probably it is not worth while to put io_context pointer in bio? Am I missing something? Thanks Vivek ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20080822185527.GD27964@redhat.com>]
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <20080822185527.GD27964@redhat.com> @ 2008-08-25 10:36 ` Fernando Luis Vázquez Cao 2008-09-05 11:50 ` Hirokazu Takahashi [not found] ` <20080905.205016.28412219.taka@valinux.co.jp> 2008-09-12 18:57 ` Vivek Goyal 1 sibling, 2 replies; 23+ messages in thread From: Fernando Luis Vázquez Cao @ 2008-08-25 10:36 UTC (permalink / raw) To: Vivek Goyal; +Cc: containers, virtualization, menage, balbir On Fri, 2008-08-22 at 14:55 -0400, Vivek Goyal wrote: > > > > As an aside, when the IO context of a certain IO operation is known > > > > (synchronous IO comes to mind) I think it should be cashed in the > > > > resulting bio so that we can do without the expensive accesses to > > > > bio_cgroup once it enters the block layer. > > > > > > Will this give you everything you need for accounting and control (from the > > > block layer?) > > > > Well, it depends on what you are trying to achieve. > > > > Current IO schedulers such as CFQ only care about the io_context when > > scheduling requests. When a new request comes in CFQ assumes that it was > > originated in the context of the current task, which obviously does not > > hold true for buffered IO and aio. This problem could be solved by using > > bio-cgroup for IO tracking, but accessing the io context information is > > somewhat expensive: > > > > page->page_cgroup->bio_cgroup->io_context. > > > > If at the time of building a bio we know its io context (i.e. the > > context of the task or cgroup that generated that bio) I think we should > > store it in the bio itself, too. With this scheme, whenever the kernel > > needs to know the io_context of a particular block IO operation the > > kernel would first try to retrieve its io_context directly from the bio, > > and, if not available there, would resort to the slow path (accessing it > > through bio_cgroup). My gut feeling is that elevator-based IO resource > > controllers would benefit from such an approach, too. > > > > Hi Fernando, > > Had a question. > > IIUC, at the time of submtting the bio, io_context will be known only for > synchronous request. For asynchronous request it will not be known > (ex. writing the dirty pages back to disk) and one shall have to take > the longer path (bio-cgroup thing) to ascertain the io_context associated > with a request. > > If that's the case, than it looks like we shall have to always traverse the > longer path in case of asynchronous IO. By putting the io_context pointer > in bio, we will just shift the time of pointer traversal. (From CFQ to higher > layers). > > So probably it is not worth while to put io_context pointer in bio? Am I > missing something? Hi Vivek! IMHO, optimizing the synchronous path alone would justify the addition of io_context in bio. There is more to this though. As you point out, it would seem that aio and buffered IO would not benefit from caching the io context in the bio itself, but there are some subtleties here. Let's consider stacking devices and buffered IO, for example. When a bio enters such a device it may get replicated several times and, depending on the topology, some other derivative bios will be created (RAID1 and parity configurations come to mind, respectively). The problem here is that the memory allocated for the newly created bios will be owned by the corresponding dm or md kernel thread, not the originator of the bio we are replicating or calculating the parity bits from. The implication of this is that if we took the longer path (via bio_cgroup) to obtain the io_context of those bios we would end up charging the wrong guy for that IO: the kernel thread, not the perpetrator of the IO. A possible solution to this could be to track the original bio inside the stacking device so that the io context of derivative bios can be obtained from its bio_cgroup. However, I am afraid such an approach would be overly complex and slow. My feeling is that storing the io_context also in bios is the right way to go: once the bio enters the block layer the kernel we can forget about memory-related issues, thus avoiding what is arguably a layering violation; io context information is not lost inside stacking devices (we just need to make sure that whenever new bios are created the io_context is carried over from the original one); and, finally, the synchronous path can be easily optimized. I hope this makes sense. Thank you for your comments. - Fernando ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: Attaching threads to cgroups is OK? 2008-08-25 10:36 ` Fernando Luis Vázquez Cao @ 2008-09-05 11:50 ` Hirokazu Takahashi [not found] ` <20080905.205016.28412219.taka@valinux.co.jp> 1 sibling, 0 replies; 23+ messages in thread From: Hirokazu Takahashi @ 2008-09-05 11:50 UTC (permalink / raw) To: fernando; +Cc: containers, balbir, menage, vgoyal, virtualization Hi, fernando, > > > > > As an aside, when the IO context of a certain IO operation is known > > > > > (synchronous IO comes to mind) I think it should be cashed in the > > > > > resulting bio so that we can do without the expensive accesses to > > > > > bio_cgroup once it enters the block layer. > > > > > > > > Will this give you everything you need for accounting and control (from the > > > > block layer?) > > > > > > Well, it depends on what you are trying to achieve. > > > > > > Current IO schedulers such as CFQ only care about the io_context when > > > scheduling requests. When a new request comes in CFQ assumes that it was > > > originated in the context of the current task, which obviously does not > > > hold true for buffered IO and aio. This problem could be solved by using > > > bio-cgroup for IO tracking, but accessing the io context information is > > > somewhat expensive: > > > > > > page->page_cgroup->bio_cgroup->io_context. > > > > > > If at the time of building a bio we know its io context (i.e. the > > > context of the task or cgroup that generated that bio) I think we should > > > store it in the bio itself, too. With this scheme, whenever the kernel > > > needs to know the io_context of a particular block IO operation the > > > kernel would first try to retrieve its io_context directly from the bio, > > > and, if not available there, would resort to the slow path (accessing it > > > through bio_cgroup). My gut feeling is that elevator-based IO resource > > > controllers would benefit from such an approach, too. > > > > > > > Hi Fernando, > > > > Had a question. > > > > IIUC, at the time of submtting the bio, io_context will be known only for > > synchronous request. For asynchronous request it will not be known > > (ex. writing the dirty pages back to disk) and one shall have to take > > the longer path (bio-cgroup thing) to ascertain the io_context associated > > with a request. > > > > If that's the case, than it looks like we shall have to always traverse the > > longer path in case of asynchronous IO. By putting the io_context pointer > > in bio, we will just shift the time of pointer traversal. (From CFQ to higher > > layers). > > > > So probably it is not worth while to put io_context pointer in bio? Am I > > missing something? > > Hi Vivek! > > IMHO, optimizing the synchronous path alone would justify the addition > of io_context in bio. There is more to this though. > > As you point out, it would seem that aio and buffered IO would not > benefit from caching the io context in the bio itself, but there are > some subtleties here. Let's consider stacking devices and buffered IO, > for example. When a bio enters such a device it may get replicated > several times and, depending on the topology, some other derivative bios > will be created (RAID1 and parity configurations come to mind, > respectively). The problem here is that the memory allocated for the > newly created bios will be owned by the corresponding dm or md kernel > thread, not the originator of the bio we are replicating or calculating > the parity bits from. I've already tried implementing this feature. Will you take a look at the thread whose subject is "I/O context inheritance" in http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2857. This code is not merged with bio-cgroup yet but I believe some of the code will help you implement what you want. Through this work, I realized that if you want introduce per-device-io_context -- each cgroup can have several io_contexts for several devices -- it is unable to determine which io_context should be used when read or write I/O is requested because the device is determined right before the request is passed to the block I/O layer. I mean a bio is allocated in the VFS while the device which handles the I/O request is determined in one of the underlying filesystems. > The implication of this is that if we took the longer path (via > bio_cgroup) to obtain the io_context of those bios we would end up > charging the wrong guy for that IO: the kernel thread, not the > perpetrator of the IO. > > A possible solution to this could be to track the original bio inside > the stacking device so that the io context of derivative bios can be > obtained from its bio_cgroup. However, I am afraid such an approach > would be overly complex and slow. > > My feeling is that storing the io_context also in bios is the right way > to go: once the bio enters the block layer the kernel we can forget > about memory-related issues, thus avoiding what is arguably a layering > violation; io context information is not lost inside stacking devices > (we just need to make sure that whenever new bios are created the > io_context is carried over from the original one); and, finally, the > synchronous path can be easily optimized. > > I hope this makes sense. > > Thank you for your comments. > > - Fernando Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20080905.205016.28412219.taka@valinux.co.jp>]
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <20080905.205016.28412219.taka@valinux.co.jp> @ 2008-09-05 12:00 ` Hirokazu Takahashi [not found] ` <20080905.210017.44596963.taka@valinux.co.jp> 1 sibling, 0 replies; 23+ messages in thread From: Hirokazu Takahashi @ 2008-09-05 12:00 UTC (permalink / raw) To: fernando; +Cc: containers, balbir, menage, vgoyal, virtualization Hi, fernando, > > IMHO, optimizing the synchronous path alone would justify the addition > > of io_context in bio. There is more to this though. > > > > As you point out, it would seem that aio and buffered IO would not > > benefit from caching the io context in the bio itself, but there are > > some subtleties here. Let's consider stacking devices and buffered IO, > > for example. When a bio enters such a device it may get replicated > > several times and, depending on the topology, some other derivative bios > > will be created (RAID1 and parity configurations come to mind, > > respectively). The problem here is that the memory allocated for the > > newly created bios will be owned by the corresponding dm or md kernel > > thread, not the originator of the bio we are replicating or calculating > > the parity bits from. > > I've already tried implementing this feature. Will you take a look > at the thread whose subject is "I/O context inheritance" in > http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2857. When I started to implement this, I would make each bio have two io_contexts -- per-process io_context which had ionice and and per-cgroup io_context. > This code is not merged with bio-cgroup yet but I believe some of the code > will help you implement what you want. > > Through this work, I realized that if you want introduce > per-device-io_context -- each cgroup can have several io_contexts > for several devices -- it is unable to determine which io_context > should be used when read or write I/O is requested because the device > is determined right before the request is passed to the block I/O layer. > > I mean a bio is allocated in the VFS while the device which handles > the I/O request is determined in one of the underlying filesystems. > > > The implication of this is that if we took the longer path (via > > bio_cgroup) to obtain the io_context of those bios we would end up > > charging the wrong guy for that IO: the kernel thread, not the > > perpetrator of the IO. > > > > A possible solution to this could be to track the original bio inside > > the stacking device so that the io context of derivative bios can be > > obtained from its bio_cgroup. However, I am afraid such an approach > > would be overly complex and slow. > > > > My feeling is that storing the io_context also in bios is the right way > > to go: once the bio enters the block layer the kernel we can forget > > about memory-related issues, thus avoiding what is arguably a layering > > violation; io context information is not lost inside stacking devices > > (we just need to make sure that whenever new bios are created the > > io_context is carried over from the original one); and, finally, the > > synchronous path can be easily optimized. > > > > I hope this makes sense. > > > > Thank you for your comments. > > > > - Fernando Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20080905.210017.44596963.taka@valinux.co.jp>]
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <20080905.210017.44596963.taka@valinux.co.jp> @ 2008-09-05 15:38 ` Vivek Goyal 2008-09-08 2:58 ` Takuya Yoshikawa ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Vivek Goyal @ 2008-09-05 15:38 UTC (permalink / raw) To: Hirokazu Takahashi; +Cc: menage, fernando, balbir, containers, virtualization On Fri, Sep 05, 2008 at 09:00:17PM +0900, Hirokazu Takahashi wrote: > Hi, fernando, > > > > IMHO, optimizing the synchronous path alone would justify the addition > > > of io_context in bio. There is more to this though. > > > > > > As you point out, it would seem that aio and buffered IO would not > > > benefit from caching the io context in the bio itself, but there are > > > some subtleties here. Let's consider stacking devices and buffered IO, > > > for example. When a bio enters such a device it may get replicated > > > several times and, depending on the topology, some other derivative bios > > > will be created (RAID1 and parity configurations come to mind, > > > respectively). The problem here is that the memory allocated for the > > > newly created bios will be owned by the corresponding dm or md kernel > > > thread, not the originator of the bio we are replicating or calculating > > > the parity bits from. > > > > I've already tried implementing this feature. Will you take a look > > at the thread whose subject is "I/O context inheritance" in > > http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2857. > > When I started to implement this, I would make each bio have two > io_contexts -- per-process io_context which had ionice and > and per-cgroup io_context. > Hi Hirokazu, I had a question. Why are we trying to create another io_context or why are we trying to mix up existing io_context (which is per task or per thread group) with cgroups? To me we just need to know the "cgroup id" to take a specific action with a bio. We don't need whole io_context strucutre. So can't we just use something like, page->page_cgroup->bio_cgroup->cgrou_id or something like that. What I mean is that the only thing which we seem to require to differentiate between various bio is cgroup id it belongs to and that can be a single "unsigned long" stored at appropriate place. Why are we looking at creating a full io_context structure and trying to share it among all the members of a cgroup? Thanks Vivek ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: Attaching threads to cgroups is OK? 2008-09-05 15:38 ` Vivek Goyal @ 2008-09-08 2:58 ` Takuya Yoshikawa [not found] ` <48C494EB.4080502@oss.ntt.co.jp> 2008-09-08 12:47 ` Hirokazu Takahashi 2 siblings, 0 replies; 23+ messages in thread From: Takuya Yoshikawa @ 2008-09-08 2:58 UTC (permalink / raw) To: vgoyal; +Cc: containers, virtualization, taka, menage, fernando, balbir Hi both, Fernando is off this week, so may not be able to reply soon, sorry. Vivek Goyal wrote: > On Fri, Sep 05, 2008 at 09:00:17PM +0900, Hirokazu Takahashi wrote: >> Hi, fernando, >> >>>> IMHO, optimizing the synchronous path alone would justify the addition >>>> of io_context in bio. There is more to this though. >>>> >>>> As you point out, it would seem that aio and buffered IO would not >>>> benefit from caching the io context in the bio itself, but there are >>>> some subtleties here. Let's consider stacking devices and buffered IO, >>>> for example. When a bio enters such a device it may get replicated >>>> several times and, depending on the topology, some other derivative bios >>>> will be created (RAID1 and parity configurations come to mind, >>>> respectively). The problem here is that the memory allocated for the >>>> newly created bios will be owned by the corresponding dm or md kernel >>>> thread, not the originator of the bio we are replicating or calculating >>>> the parity bits from. >>> I've already tried implementing this feature. Will you take a look >>> at the thread whose subject is "I/O context inheritance" in >>> http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2857. >> When I started to implement this, I would make each bio have two >> io_contexts -- per-process io_context which had ionice and >> and per-cgroup io_context. >> I am also trying to make bios have pointers to io_contexts and I have a question concerning this, very simple one. Is it OK to think bio->bi_io_context is a cache to find the io_context of an appropriate task, an issuer of this bio, which may sometimes not be the direct one, e.g. stacking devices case. If so, is it same if we make bios have pointers to the issuers, tasks or cgroups, of them and then find the io_contexts through these? I sometimes tempted to say "this bio's io_context." > > Hi Hirokazu, > > I had a question. Why are we trying to create another io_context or why > are we trying to mix up existing io_context (which is per task or per > thread group) with cgroups? > > To me we just need to know the "cgroup id" to take a specific action with > a bio. We don't need whole io_context strucutre. So can't we just use > something like, page->page_cgroup->bio_cgroup->cgrou_id or something like > that. > > What I mean is that the only thing which we seem to require to differentiate > between various bio is cgroup id it belongs to and that can be a single > "unsigned long" stored at appropriate place. Why are we looking at > creating a full io_context structure and trying to share it among all the > members of a cgroup? > > Thanks > Vivek ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <48C494EB.4080502@oss.ntt.co.jp>]
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <48C494EB.4080502@oss.ntt.co.jp> @ 2008-09-08 11:52 ` Hirokazu Takahashi 0 siblings, 0 replies; 23+ messages in thread From: Hirokazu Takahashi @ 2008-09-08 11:52 UTC (permalink / raw) To: yoshikawa.takuya Cc: containers, virtualization, menage, fernando, vgoyal, balbir Hello, > Hi both, > > Fernando is off this week, so may not be able to reply soon, sorry. > > Vivek Goyal wrote: > > On Fri, Sep 05, 2008 at 09:00:17PM +0900, Hirokazu Takahashi wrote: > >> Hi, fernando, > >> > >>>> IMHO, optimizing the synchronous path alone would justify the addition > >>>> of io_context in bio. There is more to this though. > >>>> > >>>> As you point out, it would seem that aio and buffered IO would not > >>>> benefit from caching the io context in the bio itself, but there are > >>>> some subtleties here. Let's consider stacking devices and buffered IO, > >>>> for example. When a bio enters such a device it may get replicated > >>>> several times and, depending on the topology, some other derivative bios > >>>> will be created (RAID1 and parity configurations come to mind, > >>>> respectively). The problem here is that the memory allocated for the > >>>> newly created bios will be owned by the corresponding dm or md kernel > >>>> thread, not the originator of the bio we are replicating or calculating > >>>> the parity bits from. > >>> I've already tried implementing this feature. Will you take a look > >>> at the thread whose subject is "I/O context inheritance" in > >>> http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2857. > >> When I started to implement this, I would make each bio have two > >> io_contexts -- per-process io_context which had ionice and > >> and per-cgroup io_context. > >> > > I am also trying to make bios have pointers to io_contexts and I have a > question concerning this, very simple one. > > Is it OK to think bio->bi_io_context is a cache to find the io_context > of an appropriate task, an issuer of this bio, which may sometimes not be > the direct one, e.g. stacking devices case. > > If so, is it same if we make bios have pointers to the issuers, tasks or > cgroups, of them and then find the io_contexts through these? No, I don't think it is always possible. The issuers may not exist when an io-scheduler or other modules start to handle the I/O requests. I know you can make linux kernel preserve all the context of the issuers until all the I/O requests belonging to them have finished, but it may lead waste of resources. I think this is the reason why io_context was introduced. > I sometimes tempted to say "this bio's io_context." > > > > > Hi Hirokazu, > > > > I had a question. Why are we trying to create another io_context or why > > are we trying to mix up existing io_context (which is per task or per > > thread group) with cgroups? > > > > To me we just need to know the "cgroup id" to take a specific action with > > a bio. We don't need whole io_context strucutre. So can't we just use > > something like, page->page_cgroup->bio_cgroup->cgrou_id or something like > > that. > > > > What I mean is that the only thing which we seem to require to differentiate > > between various bio is cgroup id it belongs to and that can be a single > > "unsigned long" stored at appropriate place. Why are we looking at > > creating a full io_context structure and trying to share it among all the > > members of a cgroup? > > > > > Thanks > > Vivek > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: Attaching threads to cgroups is OK? 2008-09-05 15:38 ` Vivek Goyal 2008-09-08 2:58 ` Takuya Yoshikawa [not found] ` <48C494EB.4080502@oss.ntt.co.jp> @ 2008-09-08 12:47 ` Hirokazu Takahashi 2 siblings, 0 replies; 23+ messages in thread From: Hirokazu Takahashi @ 2008-09-08 12:47 UTC (permalink / raw) To: vgoyal; +Cc: menage, fernando, balbir, containers, virtualization Hi, Vivek, > > > > IMHO, optimizing the synchronous path alone would justify the addition > > > > of io_context in bio. There is more to this though. > > > > > > > > As you point out, it would seem that aio and buffered IO would not > > > > benefit from caching the io context in the bio itself, but there are > > > > some subtleties here. Let's consider stacking devices and buffered IO, > > > > for example. When a bio enters such a device it may get replicated > > > > several times and, depending on the topology, some other derivative bios > > > > will be created (RAID1 and parity configurations come to mind, > > > > respectively). The problem here is that the memory allocated for the > > > > newly created bios will be owned by the corresponding dm or md kernel > > > > thread, not the originator of the bio we are replicating or calculating > > > > the parity bits from. > > > > > > I've already tried implementing this feature. Will you take a look > > > at the thread whose subject is "I/O context inheritance" in > > > http://www.uwsg.iu.edu/hypermail/linux/kernel/0804.2/index.html#2857. > > > > When I started to implement this, I would make each bio have two > > io_contexts -- per-process io_context which had ionice and > > and per-cgroup io_context. > > > > Hi Hirokazu, > > I had a question. Why are we trying to create another io_context or why > are we trying to mix up existing io_context (which is per task or per > thread group) with cgroups? > > To me we just need to know the "cgroup id" to take a specific action with > a bio. We don't need whole io_context strucutre. So can't we just use > something like, page->page_cgroup->bio_cgroup->cgrou_id or something like > that. > > What I mean is that the only thing which we seem to require to differentiate > between various bio is cgroup id it belongs to and that can be a single > "unsigned long" stored at appropriate place. Why are we looking at > creating a full io_context structure and trying to share it among all the > members of a cgroup? It is possible if you should only care about dm-ioband. I just thought the approach that every cgroup should also have its own io_iocontext was more generic, since both types of io_contexts represent the contexts of issuers so I think it will be good if these can be handled in the same way. I think it will make it easier for other functionalities of the kernel to use this feature. But we may have to think it over again if it has become clear that this approach costs a lot. Thanks Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: Attaching threads to cgroups is OK? [not found] ` <20080822185527.GD27964@redhat.com> 2008-08-25 10:36 ` Fernando Luis Vázquez Cao @ 2008-09-12 18:57 ` Vivek Goyal 1 sibling, 0 replies; 23+ messages in thread From: Vivek Goyal @ 2008-09-12 18:57 UTC (permalink / raw) To: Fernando Luis Vázquez Cao; +Cc: containers, virtualization, menage, balbir On Fri, Aug 22, 2008 at 02:55:27PM -0400, Vivek Goyal wrote: > On Thu, Aug 21, 2008 at 02:25:06PM +0900, Fernando Luis Vázquez Cao wrote: > > Hi Balbir, > > > > On Thu, 2008-08-21 at 09:02 +0530, Balbir Singh wrote: > > > Fernando Luis Vázquez Cao wrote: > > > > On Wed, 2008-08-20 at 20:48 +0900, Hirokazu Takahashi wrote: > > > >> Hi, > > > >> > > > >>>> Tsuruta-san, how about your bio-cgroup's tracking concerning this? > > > >>>> If we want to use your tracking functions for each threads seperately, > > > >>>> there seems to be a problem. > > > >>>> ===cf. mm_get_bio_cgroup()=================== > > > >>>> owner > > > >>>> mm_struct ----> task_struct ----> bio_cgroup > > > >>>> ============================================= > > > >>>> In my understanding, the mm_struct of a thread is same as its parent's. > > > >>>> So, even if we attach the TIDs of some threads to different cgroups the > > > >>>> tracking always returns the same bio_cgroup -- its parent's group. > > > >>>> Do you have some policy about in which case we can use your tracking? > > > >>>> > > > >>> It's will be resitriction when io-controller reuse information of the owner > > > >>> of memory. But if it's very clear who issues I/O (by tracking read/write > > > >>> syscall), we may have chance to record the issuer of I/O to page_cgroup > > > >>> struct. > > > >> This might be slightly different topic though, > > > >> I've been thinking where we should add hooks to track I/O reqeust. > > > >> I think the following set of hooks is enough whether we are going to > > > >> support thread based cgroup or not. > > > >> > > > >> Hook-1: called when allocating a page, where the memory controller > > > >> already have a hoook. > > > >> Hook-2: called when making a page in page-cache dirty. > > > >> > > > >> For anonymous pages, Hook-1 is enough to track any type of I/O request. > > > >> For pages in page-cache, Hook-1 is also enough for read I/O because > > > >> the I/O is issued just once right after allocting the page. > > > >> For write I/O requests to pages in page-cache, Hook-1 will be okay > > > >> in most cases but sometimes process in another cgroup may write > > > >> the pages. In this case, Hook-2 is needed to keep accurate to track > > > >> I/O requests. > > > > > > > > This relative simplicity is what prompted me to say that we probably > > > > should try to disentangle the io tracking functionality from the memory > > > > controller a bit more (of course we still should reuse as much as we can > > > > from it). The rationale for this is that the existing I/O scheduler > > > > would benefit from proper io tracking capabilities too, so it'd be nice > > > > if we could have them even in non-cgroup-capable kernels. > > > > > > > > > > Hook 2 referred to in the mail above exist today in the form of task IO accounting. > > Yup. > > > > > > As an aside, when the IO context of a certain IO operation is known > > > > (synchronous IO comes to mind) I think it should be cashed in the > > > > resulting bio so that we can do without the expensive accesses to > > > > bio_cgroup once it enters the block layer. > > > > > > Will this give you everything you need for accounting and control (from the > > > block layer?) > > > > Well, it depends on what you are trying to achieve. > > > > Current IO schedulers such as CFQ only care about the io_context when > > scheduling requests. When a new request comes in CFQ assumes that it was > > originated in the context of the current task, which obviously does not > > hold true for buffered IO and aio. This problem could be solved by using > > bio-cgroup for IO tracking, but accessing the io context information is > > somewhat expensive: > > > > page->page_cgroup->bio_cgroup->io_context. > > > > If at the time of building a bio we know its io context (i.e. the > > context of the task or cgroup that generated that bio) I think we should > > store it in the bio itself, too. With this scheme, whenever the kernel > > needs to know the io_context of a particular block IO operation the > > kernel would first try to retrieve its io_context directly from the bio, > > and, if not available there, would resort to the slow path (accessing it > > through bio_cgroup). My gut feeling is that elevator-based IO resource > > controllers would benefit from such an approach, too. > > > > Hi Fernando, > > Had a question. > > IIUC, at the time of submtting the bio, io_context will be known only for > synchronous request. For asynchronous request it will not be known > (ex. writing the dirty pages back to disk) and one shall have to take > the longer path (bio-cgroup thing) to ascertain the io_context associated > with a request. > > If that's the case, than it looks like we shall have to always traverse the > longer path in case of asynchronous IO. By putting the io_context pointer > in bio, we will just shift the time of pointer traversal. (From CFQ to higher > layers). > > So probably it is not worth while to put io_context pointer in bio? Am I > missing something? > Hi Fernando, I thought you did not get a chance to reply to this mail until today I found your reply on virtualization list archive. (I am not on the virtualization list). I am assuming that by mistake you just replied to virutalization list or mail got lost somewhere. https://lists.linux-foundation.org/pipermail/virtualization/2008-August/011588.html Anyway, now I understand little better the issue at hand. Because cfq retrieves the io_context information from the "current" and that can be problematic in case of any software entitiy above elevator which buffers the bio's and does some processing and then releases it to elevator. Two cases come to my mind. - Any stacked device driver - I am also running into issues while I am putting all the requests on an rb-tree (per request queue) before releasing them to elevator. I try to control the requests on this rb-tree based on cgroup weights and then release it to elevator. But I realized that cfq will get the wrong io_context information because of request buffering. It then probably makes sense to put io_context information in bio and let cfq retrieve it from bio instead of current. This way we will be able to decouple the assumption that bio belongs to the thread submitting it to cfq and will allow us to trap request and do processing and then sumit to cfq without loosing io_context information. Any kind of cgroup mechanism will also need to map a bio to the respective cgroup. Because io_context is contained in task_struct, one can retrieve bio->io_context and then task_struct from it and then find the cgroup information and account the bio appropriately. But this assumes that task_struct is still around but that might not be the case..... Any ideas? Or may be we can go bio->page->pc_page->bio_cgroup->cgroup_id route but that would not work very well in the case when stacked devices try to replicate the bio. As you said memory of new bio will belong to some kernel thread and accounting will not be proper. That leaves me thinking... Thanks Vivek ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: Attaching threads to cgroups is OK? [not found] <48AAA296.8050802@oss.ntt.co.jp> 2008-08-19 11:22 ` RFC: Attaching threads to cgroups is OK? KAMEZAWA Hiroyuki [not found] ` <20080819202237.edd75933.kamezawa.hiroyu@jp.fujitsu.com> @ 2008-08-20 7:41 ` Hirokazu Takahashi 2 siblings, 0 replies; 23+ messages in thread From: Hirokazu Takahashi @ 2008-08-20 7:41 UTC (permalink / raw) To: yoshikawa.takuya Cc: containers, =?iso-8859-1?Q?Fernando=5FLuis=5FV=E1, fernando, virtualization Hi, > Hi everyone, > > I have a question about cgroup's policy concerning the treatment of > threads. Please consider that we want to attach an application which has > some threads already to a certain cgroup. If we echo the pid of this > application to the "tasks" file connected to this cgroup the threads > belonging to this application will NOT be moved to the new group. Is it > right? If so, is it OK? > > I mean, in the current implementation, threads created before the > attachement of the parent process are not treated eaqually to those > created after. > > Could you tell me if you know something about the rules of attachement > of pid, or tid, to cgroups? -- what ID is OK to write to "tasks" file > and what we can expect as a result? FYI, it won't happen anything in case you just move a process or thread since the current implementation of the memory controller haven't supported this feature yet. This restriction won't be removed unless we make pages be able to move between cgroups. > Tsuruta-san, how about your bio-cgroup's tracking concerning this? > If we want to use your tracking functions for each threads seperately, > there seems to be a problem. > ===cf. mm_get_bio_cgroup()=================== > owner > mm_struct ----> task_struct ----> bio_cgroup > ============================================= > In my understanding, the mm_struct of a thread is same as its parent's. > So, even if we attach the TIDs of some threads to different cgroups the > tracking always returns the same bio_cgroup -- its parent's group. > Do you have some policy about in which case we can use your tracking? > > Thanks, > -- Takuya Yoshikawa Thank you, Hirokazu Takahashi. ^ permalink raw reply [flat|nested] 23+ messages in thread
* RFC: Attaching threads to cgroups is OK?
@ 2008-08-19 10:38 Takuya Yoshikawa
0 siblings, 0 replies; 23+ messages in thread
From: Takuya Yoshikawa @ 2008-08-19 10:38 UTC (permalink / raw)
To: containers, virtualization; +Cc: zquez Cao, =?ISO-8859-1?Q?Fernando_Luis_V=E1?=
Hi everyone,
I have a question about cgroup's policy concerning the treatment of
threads. Please consider that we want to attach an application which has
some threads already to a certain cgroup. If we echo the pid of this
application to the "tasks" file connected to this cgroup the threads
belonging to this application will NOT be moved to the new group. Is it
right? If so, is it OK?
I mean, in the current implementation, threads created before the
attachement of the parent process are not treated eaqually to those
created after.
Could you tell me if you know something about the rules of attachement
of pid, or tid, to cgroups? -- what ID is OK to write to "tasks" file
and what we can expect as a result?
Tsuruta-san, how about your bio-cgroup's tracking concerning this?
If we want to use your tracking functions for each threads seperately,
there seems to be a problem.
===cf. mm_get_bio_cgroup()===================
owner
mm_struct ----> task_struct ----> bio_cgroup
=============================================
In my understanding, the mm_struct of a thread is same as its parent's.
So, even if we attach the TIDs of some threads to different cgroups the
tracking always returns the same bio_cgroup -- its parent's group.
Do you have some policy about in which case we can use your tracking?
Thanks,
-- Takuya Yoshikawa
^ permalink raw reply [flat|nested] 23+ messages in threadend of thread, other threads:[~2008-09-12 18:57 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <48AAA296.8050802@oss.ntt.co.jp>
2008-08-19 11:22 ` RFC: Attaching threads to cgroups is OK? KAMEZAWA Hiroyuki
[not found] ` <20080819202237.edd75933.kamezawa.hiroyu@jp.fujitsu.com>
2008-08-19 12:27 ` Balbir Singh
[not found] ` <48AABC31.7070207@linux.vnet.ibm.com>
2008-08-19 12:52 ` Fernando Luis Vázquez Cao
2008-08-20 5:52 ` Takuya Yoshikawa
[not found] ` <1219150334.14590.12.camel@sebastian.kern.oss.ntt.co.jp>
2008-08-20 7:12 ` Hirokazu Takahashi
[not found] ` <20080820.161247.64324924.taka@valinux.co.jp>
2008-08-20 8:43 ` KAMEZAWA Hiroyuki
2008-08-22 1:03 ` Takuya Yoshikawa
2008-08-20 11:48 ` Hirokazu Takahashi
[not found] ` <20080820.204832.131207708.taka@valinux.co.jp>
2008-08-21 3:08 ` Fernando Luis Vázquez Cao
[not found] ` <1219288081.28324.30.camel@sebastian.kern.oss.ntt.co.jp>
2008-08-21 3:32 ` Balbir Singh
[not found] ` <48ACE1B4.8010000@linux.vnet.ibm.com>
2008-08-21 5:25 ` Fernando Luis Vázquez Cao
[not found] ` <1219296306.28324.82.camel@sebastian.kern.oss.ntt.co.jp>
2008-08-21 10:28 ` Balbir Singh
2008-08-22 18:55 ` Vivek Goyal
[not found] ` <20080822185527.GD27964@redhat.com>
2008-08-25 10:36 ` Fernando Luis Vázquez Cao
2008-09-05 11:50 ` Hirokazu Takahashi
[not found] ` <20080905.205016.28412219.taka@valinux.co.jp>
2008-09-05 12:00 ` Hirokazu Takahashi
[not found] ` <20080905.210017.44596963.taka@valinux.co.jp>
2008-09-05 15:38 ` Vivek Goyal
2008-09-08 2:58 ` Takuya Yoshikawa
[not found] ` <48C494EB.4080502@oss.ntt.co.jp>
2008-09-08 11:52 ` Hirokazu Takahashi
2008-09-08 12:47 ` Hirokazu Takahashi
2008-09-12 18:57 ` Vivek Goyal
2008-08-20 7:41 ` Hirokazu Takahashi
2008-08-19 10:38 Takuya Yoshikawa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).