All of lore.kernel.org
 help / color / mirror / Atom feed
diff for duplicates of <489315B2.2080506@linux.vnet.ibm.com>

diff --git a/a/1.txt b/N1/1.txt
index 069dbc1..2026558 100644
--- a/a/1.txt
+++ b/N1/1.txt
@@ -1,121 +1,241 @@
 Hi, All,
 
+
+
 This is the first part of the resource management and control groups discussion.
+
 I might have made mistakes while taking notes or typing them out, please feel
+
 free to correct them for me or send me corrections.
 
+
+
 The notes are really large, so they'll come in installments. This is the first
+
 part of the notes.
 
+
+
 Control Groups
+
 ==============
 
+
+
 1. Multiphase locking - Paul brought up his multi phase locking design and
+
 suggested approaches to implementing them. The problem with control groups
+
 currently is that transactions cannot be atomically committed. If some
+
 transactions fail (can_attach() callback fails or returns error), then there is
+
 no notification sent out to groups that already committed the transaction
 
+
+
 The suggested design includes
+
 	- Acquiring locks across callbacks - Balbir opposed this approach
+
           stating that this would make it easier for subsystems to deadlock.
+
           Balbir instead suggested that each callback hold it's own lock and
+
           add an undo operation that cannot fail (returns void), since
+
           uncharging usually succeeds. Dave suggested doing undo without holding
+
           any locks.
 
+
+
 2. Procs - Balbir and others have asked for an API to move all threads of a
+
 process in one go from one control group to another. The question about doing it
+
 in user space was asked. Doing it in user space is easy, but it can be expensive
+
 (moving all threads one by one - acquiring the cgroup lock and releasing it for
+
 every thread). What happens if another move is requested while a partial move is
+
 in progress? Dave suggested that we have an abstract aggregation so that we
+
 don't need to keep adding interfaces for every aggregation. Balbir mentioned
+
 that the aggregation of interest are process, process groups and sessions and
+
 the kernel already knows about these (there are data structures to link all
+
 elements together). Abstracting it is a good idea, but hard to implement.
 
+
+
 Paul asked what the behaviour should be, if a process being moved has several
+
 threads belong to different cgroups. The answer that came up was that they
+
 should all be migrated to the destination cgroup
 
+
+
 3. Cgroup lock - The cgroup lock is held at various places in the system. The
+
 question is -- is cgroup_lock() becoming the next BKL? Several solutions were
+
 discussed - making the lock per hierarchy or per cgroup or use subsystem locks.
+
 Paul mentioned that cgroups already use RCU.
 
+
+
 4. Binary statistics - The question about binary statistics was raised. Since
+
 control groups don't enforce any particular kind of API, is there a way to
+
 generically handle control files and their parameters in the library? Paul
+
 suggested his binary API approach, where every control group and it's API is
+
 documented in an api file. Eric suggested using an ASCII interface (since that
+
 is very generic) and using one file per API. Balbir mentioned that this will
+
 lead to too many dentries and issues related to having extensive number of dentries.
 
+
+
 5. User space notifications - Kamezawa had requested for user space notification
+
 (through inotify) when a control group reaches it's memory limit for example.
+
 The questions that were asked were, what happens if no one is listening in on
+
 notifications? Denis suggested using a FIFO mechanism. Balbir suggested using
+
 netlinks and building stuff on top of cgroupstats. With netlink we can pass
+
 type, value and length of arguments, making it more suitable for this kind of
+
 information exchange. The only concern with netlink is that it can lose
+
 messages. The general consensus was to add one FIFO per control group and use
+
 that for all notifications related to the control group.
 
+
+
 Resource management
+
 ===================
+
 1. Memory controller - Balbir mentioned that this is best discussed at the
+
 memory controller BoF
+
 2. Device subsystem was discussed and it was decided that mount (filesystem)
+
 namespace and device namespace are the best places to handle device subsystem
+
 issues.
+
 3. Memrlimit - Balbir discussed the memrlimit controller. Dave and Paul are
+
 opposed to doing any limits based on virtual address space. Balbir mentioned
+
 that it serves several purposes
 
+
+
 a. It allows us to control swap usage
+
 b. It allows us to build a generic rlimits infrastructure
+
 c. It allows us to fail applications nicely
 
+
+
 Paul mentioned that (c) was not useful since no applications handle it today.
+
 Balbir disagreed with that argument as being sufficient to prevent future
+
 applications to handle malloc()/mmap() failure. Balbir asked why overcommit
+
 accounting was not useful?
 
+
+
 There was general agreement that a mlock() controller would be useful.
 
+
+
 4. CPU controller - There was a request for hard limit feature. Peter opposed
+
 the approach stating that anyone wanting hard limits should use the real time
+
 group scheduler and a new EDF scheduler is being implemented. Denis mentioned
+
 that without hard limits it is not possible for a service provider to
+
 decide/plan how much capacity a single CPU can provide. Balbir mentioned that
+
 with hard limits and SLA's the service provider could on reaching the hard limit
+
 can save power by hard limiting execution on a CPU that is meeting its SLA
+
 requirements. Peter mentioned that hard limits would make the group scheduler,
+
 non work conserving.
 
+
+
 Peter also updated everyone about the new load balancing patches that will make
+
 it into the next merge window.
 
+
+
 5. Kernel memory controller - The kernel memory controller was discussed
+
 briefly. Pavel has not been actively working on it. Denis mentioned that it
+
 would be nice to have a network buffer controller as well. Questions were asked
+
 if the kernel memory controller should be merged with the existing memory
+
 controller?
 
+
+
 6. Swap subsystem - Daisuke mentioned that the swap subsystem works well for
+
 fundamental operations and that he posted a version of the patch three weeks
+
 ago. The patch controls swap entries to control the swap usage of a control
+
 group. Paul mentioned that google has a patch internally to link swap files to
+
 cpusets. Balbir asked Serge about his swap namespace patches. The swap namespace
+
 is a different issue all together (compared to the swap controller). Currently
+
 the swap controller is a part of the memory controller. There has been some
+
 discussion about it being an independent controller.
 
 
 
+
+
+
+
 -- 
+
 	Warm Regards,
+
 	Balbir Singh
+
 	Linux Technology Center
+
 	IBM, ISTL
diff --git a/a/content_digest b/N1/content_digest
index c115a4a..b68256d 100644
--- a/a/content_digest
+++ b/N1/content_digest
@@ -1,129 +1,249 @@
- "From\0Balbir Singh <balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>\0"
+ "From\0Balbir Singh <balbir@linux.vnet.ibm.com>\0"
  "Subject\0Control groups and Resource Management notes (part I)\0"
  "Date\0Fri, 01 Aug 2008 19:24:58 +0530\0"
- "To\0Linux Containers <containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>\0"
+ "To\0Linux Containers <containers@lists.osdl.org>\0"
  "\00:1\0"
  "b\0"
  "Hi, All,\n"
  "\n"
+ "\n"
+ "\n"
  "This is the first part of the resource management and control groups discussion.\n"
+ "\n"
  "I might have made mistakes while taking notes or typing them out, please feel\n"
+ "\n"
  "free to correct them for me or send me corrections.\n"
  "\n"
+ "\n"
+ "\n"
  "The notes are really large, so they'll come in installments. This is the first\n"
+ "\n"
  "part of the notes.\n"
  "\n"
+ "\n"
+ "\n"
  "Control Groups\n"
+ "\n"
  "==============\n"
  "\n"
+ "\n"
+ "\n"
  "1. Multiphase locking - Paul brought up his multi phase locking design and\n"
+ "\n"
  "suggested approaches to implementing them. The problem with control groups\n"
+ "\n"
  "currently is that transactions cannot be atomically committed. If some\n"
+ "\n"
  "transactions fail (can_attach() callback fails or returns error), then there is\n"
+ "\n"
  "no notification sent out to groups that already committed the transaction\n"
  "\n"
+ "\n"
+ "\n"
  "The suggested design includes\n"
+ "\n"
  "\t- Acquiring locks across callbacks - Balbir opposed this approach\n"
+ "\n"
  "          stating that this would make it easier for subsystems to deadlock.\n"
+ "\n"
  "          Balbir instead suggested that each callback hold it's own lock and\n"
+ "\n"
  "          add an undo operation that cannot fail (returns void), since\n"
+ "\n"
  "          uncharging usually succeeds. Dave suggested doing undo without holding\n"
+ "\n"
  "          any locks.\n"
  "\n"
+ "\n"
+ "\n"
  "2. Procs - Balbir and others have asked for an API to move all threads of a\n"
+ "\n"
  "process in one go from one control group to another. The question about doing it\n"
+ "\n"
  "in user space was asked. Doing it in user space is easy, but it can be expensive\n"
+ "\n"
  "(moving all threads one by one - acquiring the cgroup lock and releasing it for\n"
+ "\n"
  "every thread). What happens if another move is requested while a partial move is\n"
+ "\n"
  "in progress? Dave suggested that we have an abstract aggregation so that we\n"
+ "\n"
  "don't need to keep adding interfaces for every aggregation. Balbir mentioned\n"
+ "\n"
  "that the aggregation of interest are process, process groups and sessions and\n"
+ "\n"
  "the kernel already knows about these (there are data structures to link all\n"
+ "\n"
  "elements together). Abstracting it is a good idea, but hard to implement.\n"
  "\n"
+ "\n"
+ "\n"
  "Paul asked what the behaviour should be, if a process being moved has several\n"
+ "\n"
  "threads belong to different cgroups. The answer that came up was that they\n"
+ "\n"
  "should all be migrated to the destination cgroup\n"
  "\n"
+ "\n"
+ "\n"
  "3. Cgroup lock - The cgroup lock is held at various places in the system. The\n"
+ "\n"
  "question is -- is cgroup_lock() becoming the next BKL? Several solutions were\n"
+ "\n"
  "discussed - making the lock per hierarchy or per cgroup or use subsystem locks.\n"
+ "\n"
  "Paul mentioned that cgroups already use RCU.\n"
  "\n"
+ "\n"
+ "\n"
  "4. Binary statistics - The question about binary statistics was raised. Since\n"
+ "\n"
  "control groups don't enforce any particular kind of API, is there a way to\n"
+ "\n"
  "generically handle control files and their parameters in the library? Paul\n"
+ "\n"
  "suggested his binary API approach, where every control group and it's API is\n"
+ "\n"
  "documented in an api file. Eric suggested using an ASCII interface (since that\n"
+ "\n"
  "is very generic) and using one file per API. Balbir mentioned that this will\n"
+ "\n"
  "lead to too many dentries and issues related to having extensive number of dentries.\n"
  "\n"
+ "\n"
+ "\n"
  "5. User space notifications - Kamezawa had requested for user space notification\n"
+ "\n"
  "(through inotify) when a control group reaches it's memory limit for example.\n"
+ "\n"
  "The questions that were asked were, what happens if no one is listening in on\n"
+ "\n"
  "notifications? Denis suggested using a FIFO mechanism. Balbir suggested using\n"
+ "\n"
  "netlinks and building stuff on top of cgroupstats. With netlink we can pass\n"
+ "\n"
  "type, value and length of arguments, making it more suitable for this kind of\n"
+ "\n"
  "information exchange. The only concern with netlink is that it can lose\n"
+ "\n"
  "messages. The general consensus was to add one FIFO per control group and use\n"
+ "\n"
  "that for all notifications related to the control group.\n"
  "\n"
+ "\n"
+ "\n"
  "Resource management\n"
+ "\n"
  "===================\n"
+ "\n"
  "1. Memory controller - Balbir mentioned that this is best discussed at the\n"
+ "\n"
  "memory controller BoF\n"
+ "\n"
  "2. Device subsystem was discussed and it was decided that mount (filesystem)\n"
+ "\n"
  "namespace and device namespace are the best places to handle device subsystem\n"
+ "\n"
  "issues.\n"
+ "\n"
  "3. Memrlimit - Balbir discussed the memrlimit controller. Dave and Paul are\n"
+ "\n"
  "opposed to doing any limits based on virtual address space. Balbir mentioned\n"
+ "\n"
  "that it serves several purposes\n"
  "\n"
+ "\n"
+ "\n"
  "a. It allows us to control swap usage\n"
+ "\n"
  "b. It allows us to build a generic rlimits infrastructure\n"
+ "\n"
  "c. It allows us to fail applications nicely\n"
  "\n"
+ "\n"
+ "\n"
  "Paul mentioned that (c) was not useful since no applications handle it today.\n"
+ "\n"
  "Balbir disagreed with that argument as being sufficient to prevent future\n"
+ "\n"
  "applications to handle malloc()/mmap() failure. Balbir asked why overcommit\n"
+ "\n"
  "accounting was not useful?\n"
  "\n"
+ "\n"
+ "\n"
  "There was general agreement that a mlock() controller would be useful.\n"
  "\n"
+ "\n"
+ "\n"
  "4. CPU controller - There was a request for hard limit feature. Peter opposed\n"
+ "\n"
  "the approach stating that anyone wanting hard limits should use the real time\n"
+ "\n"
  "group scheduler and a new EDF scheduler is being implemented. Denis mentioned\n"
+ "\n"
  "that without hard limits it is not possible for a service provider to\n"
+ "\n"
  "decide/plan how much capacity a single CPU can provide. Balbir mentioned that\n"
+ "\n"
  "with hard limits and SLA's the service provider could on reaching the hard limit\n"
+ "\n"
  "can save power by hard limiting execution on a CPU that is meeting its SLA\n"
+ "\n"
  "requirements. Peter mentioned that hard limits would make the group scheduler,\n"
+ "\n"
  "non work conserving.\n"
  "\n"
+ "\n"
+ "\n"
  "Peter also updated everyone about the new load balancing patches that will make\n"
+ "\n"
  "it into the next merge window.\n"
  "\n"
+ "\n"
+ "\n"
  "5. Kernel memory controller - The kernel memory controller was discussed\n"
+ "\n"
  "briefly. Pavel has not been actively working on it. Denis mentioned that it\n"
+ "\n"
  "would be nice to have a network buffer controller as well. Questions were asked\n"
+ "\n"
  "if the kernel memory controller should be merged with the existing memory\n"
+ "\n"
  "controller?\n"
  "\n"
+ "\n"
+ "\n"
  "6. Swap subsystem - Daisuke mentioned that the swap subsystem works well for\n"
+ "\n"
  "fundamental operations and that he posted a version of the patch three weeks\n"
+ "\n"
  "ago. The patch controls swap entries to control the swap usage of a control\n"
+ "\n"
  "group. Paul mentioned that google has a patch internally to link swap files to\n"
+ "\n"
  "cpusets. Balbir asked Serge about his swap namespace patches. The swap namespace\n"
+ "\n"
  "is a different issue all together (compared to the swap controller). Currently\n"
+ "\n"
  "the swap controller is a part of the memory controller. There has been some\n"
+ "\n"
  "discussion about it being an independent controller.\n"
  "\n"
  "\n"
  "\n"
+ "\n"
+ "\n"
+ "\n"
+ "\n"
  "-- \n"
+ "\n"
  "\tWarm Regards,\n"
+ "\n"
  "\tBalbir Singh\n"
+ "\n"
  "\tLinux Technology Center\n"
+ "\n"
  "\tIBM, ISTL"
 
-7eae0cb133fbf36c642079ace58a9085dfc5c85dae7e4f55ee1375200d057061
+fe2084aac6c42b5a04f2929bc7f860dd5b8be866ef5867ae0d5a39c3c7c0136f

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.