* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure [not found] <20060321061333.27638.63963.stgit@localhost.localdomain> @ 2006-03-21 18:50 ` Dave Hansen 2006-03-21 21:08 ` Sam Vilain 2006-03-22 6:41 ` Eric W. Biederman [not found] ` <20060321061333.27638.9112.stgit@localhost.localdomain> 1 sibling, 2 replies; 29+ messages in thread From: Dave Hansen @ 2006-03-21 18:50 UTC (permalink / raw) To: Sam Vilain Cc: linux-kernel, Herbert Poetzl, Eric W.Biederman, OpenVZ developers list, Serge E.Hallyn, Andrew Morton On Tue, 2006-03-21 at 18:13 +1200, Sam Vilain wrote: > Here is a work in progress of trying to extract some the core vserver > architecture and present it as an incremental set of patches. Hi Sam, These patches are certainly getting better and better broken out all the time. Nice work. But, I worry that they just aren't generic enough yet. I don't see any response from any of the other "container/namespace/vps" people. I fear that this means that they don't look broadly useful enough, yet. That said, at this point, I'd just about rather have _anything_ merged than the nothing we have at this point. As we throw patches back and forth, we can't seem to agree on even some very small points. I also have a sinking feeling that everybody has gone back off and continues to develop their own out-of-tree functionality, deepening the patch divide. Is there anything we could merge that we _all_ don't like? I'm pretty convinced that no single solution will support Eric's, OpenVZ's, and VServer's _existing_ usage models. Somebody is going to have to bend, or nothing will ever get merged. Any volunteers? ;) What about going back to the very simple "struct container" on which to build? http://lkml.org/lkml/2006/2/3/205 -- Dave ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-21 18:50 ` [RFC] [PATCH 0/7] Some basic vserver infrastructure Dave Hansen @ 2006-03-21 21:08 ` Sam Vilain 2006-03-21 21:32 ` Dave Hansen 2006-03-22 6:41 ` Eric W. Biederman 1 sibling, 1 reply; 29+ messages in thread From: Sam Vilain @ 2006-03-21 21:08 UTC (permalink / raw) To: Dave Hansen Cc: linux-kernel, Herbert Poetzl, Eric W.Biederman, OpenVZ developers list, Serge E.Hallyn, Andrew Morton Dave Hansen wrote: >That said, at this point, I'd just about rather have _anything_ merged >than the nothing we have at this point. As we throw patches back and >forth, we can't seem to agree on even some very small points. > >I also have a sinking feeling that everybody has gone back off and >continues to develop their own out-of-tree functionality, deepening the >patch divide. > >Is there anything we could merge that we _all_ don't like? I'm pretty >convinced that no single solution will support Eric's, OpenVZ's, and >VServer's _existing_ usage models. Somebody is going to have to bend, >or nothing will ever get merged. Any volunteers? ;) > > I don't think they're all that different conceptually. If something as simple as this was merged then we'd at least have a common structure to throw things in, and a common syscall infrastructure that can gracefully handle kernel API versioning without requiring dozens of syscalls. >What about going back to the very simple "struct container" on which to >build? > > Please read "vx_info" as "container" (or your preferred term). I decided to punt on the naming issue and copy Herbert :-). And also because the acronym "vx" makes the API look nice, at least to mine and Herbert's eyes, then when you go to the network virtualisation you get "nx_info", etc. However I'm thinking any of these terms might also be right: - "vserver" spelt in full - family - container - jail - task_ns (sort for namespace) Perhaps we can get a ruling from core team on this one, as it's aesthetics :-). > http://lkml.org/lkml/2006/2/3/205 > > This patch is simple, but does not handle SMP scalability very well (you'll get a lot of cacheline problems when you start actually using the container structure; the hashing helps a lot there), and does not provide functions such as looking up a container by ID etc. I think Herbert's context.c is a pretty nice basic set of functions for dealing with container-like things. Thanks for your feedback! Sam. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-21 21:08 ` Sam Vilain @ 2006-03-21 21:32 ` Dave Hansen 2006-03-21 23:12 ` Sam Vilain 0 siblings, 1 reply; 29+ messages in thread From: Dave Hansen @ 2006-03-21 21:32 UTC (permalink / raw) To: Sam Vilain Cc: linux-kernel, Herbert Poetzl, Eric W.Biederman, OpenVZ developers list, Serge E.Hallyn, Andrew Morton On Wed, 2006-03-22 at 09:08 +1200, Sam Vilain wrote: > Dave Hansen wrote: > >What about going back to the very simple "struct container" on which to > >build? > > Please read "vx_info" as "container" (or your preferred term). I > decided to punt on the naming issue and copy Herbert :-). My point was that we go back to something simple which we can all understand and build on. The code which was just posted is quite complex. Although I trust that most of it is needed, the justification for the complexity simply is not there. By starting painfully simply, we can build on complexity in bits, and justify it as we go. > And also because the acronym "vx" makes the API look nice, at least to > mine and Herbert's eyes, then when you go to the network virtualisation > you get "nx_info", etc. However I'm thinking any of these terms might > also be right: > > - "vserver" spelt in full > - family > - container > - jail > - task_ns (sort for namespace) > > Perhaps we can get a ruling from core team on this one, as it's > aesthetics :-). I was in a meeting with a few coworkers, and we were arguing a bit about naming. One person there was a manager-type who didn't have any direct involvement in the project. We asked him which naming was more clear. We need to think a bit like that. What is more clear to somebody who has never read the code? (Hint "vx_" means nothing. :) > > http://lkml.org/lkml/2006/2/3/205 > > > This patch is simple, but does not handle SMP scalability very well > (you'll get a lot of cacheline problems when you start actually using > the container structure; the hashing helps a lot there) Could you elaborate a bit on this one? What has cacheline problems? > and does not provide functions such as looking up a container by ID etc. We need something so simple that we probably don't even deal with ids. I believe that Eric claims that we don't really need container _ids_. For instance, the filesystem namespaces have no ids, and work just fine. -- Dave ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-21 21:32 ` Dave Hansen @ 2006-03-21 23:12 ` Sam Vilain 2006-03-22 5:18 ` Sam Vilain 2006-03-22 7:13 ` Eric W. Biederman 0 siblings, 2 replies; 29+ messages in thread From: Sam Vilain @ 2006-03-21 23:12 UTC (permalink / raw) To: Dave Hansen Cc: linux-kernel, Herbert Poetzl, Eric W.Biederman, OpenVZ developers list, Serge E.Hallyn, Andrew Morton Dave Hansen wrote: >>This patch is simple, but does not handle SMP scalability very well >>(you'll get a lot of cacheline problems when you start actually using >>the container structure; the hashing helps a lot there) >> >> >Could you elaborate a bit on this one? What has cacheline problems? > > OK, on reflection this probably doesn't belong this early in the series. It only helps speed up lookup by IDs which is only an SMP speed enhancement if you expect a lot of those (eg, when storing the XIDs on another subsystem, such as a filesystem) and unusual things like creating/destroying a lot of vservers quickly. I'll pull it out. I tried to make it as basic as possible, but my basic problem with the patch you linked to is that it shouldn't be so small that you don't get a decent bunch of the internal API functions listed in description of part 1 of the set (http://vserver.utsl.gen.nz/patches/utsl/2.6.16-rc4-vsi/01-VServer-Umbrella.diff) >>and does not provide functions such as looking up a container by ID etc. >> >> > >We need something so simple that we probably don't even deal with ids. >I believe that Eric claims that we don't really need container _ids_. >For instance, the filesystem namespaces have no ids, and work just fine. > > For actual namespace structures themselves that are virtualising real things, I agree entirely. But I think the whole point of this patchset (which sadly vger ate for the wider audience, and I still don't know why) is to group tasks and to give userland, and hence the administrator, something tangible with which to group processes with and tack namespace structures onto. I can split out the ID related stuff into another patch, but it would need to go back in before a useful syscall interface can be added. I'll consider it anyway, though. Note that a given vx_info structure still might share namespace structures with other vx_info objects - this is what I was alluding to with the "we haven't made any restrictions on the nature of virtualisation yet" comment. All we've made is the (XID, PID) tuple (although the default for XID=0 is that all processes are visible in one big PID space). The intention is that flags will control how you want your PIDs to work for that vserver. The only thing that they can't do is share processes - it is a one to many relationship. However, if there can be a parent/child containership relationship on the XIDs themselves, you can still achieve the behaviour of these unusual situations. I'm working on the assumption that if we ever have to migrate trees of XIDs then we can virtualise how they look inside a vserver using flags. Eric, perhaps you can comment or refer to the earlier post where you made this argument. I tried to follow it but perhaps I missed the jist of one of the messages or one of the most important messages entirely. >By starting painfully simply, we can build on complexity in bits, and >justify it as we go. > > This is my aim too! I'll keep chopping and changing. >>And also because the acronym "vx" makes the API look nice, at least to >>mine and Herbert's eyes, then when you go to the network virtualisation >>you get "nx_info", etc. However I'm thinking any of these terms might >>also be right: >> >> - "vserver" spelt in full >> - family >> - container >> - jail >> - task_ns (sort for namespace) >> >>Perhaps we can get a ruling from core team on this one, as it's >>aesthetics :-). >> >> > >I was in a meeting with a few coworkers, and we were arguing a bit about >naming. One person there was a manager-type who didn't have any direct >involvement in the project. We asked him which naming was more clear. > >We need to think a bit like that. What is more clear to somebody who >has never read the code? (Hint "vx_" means nothing. :) > > OK, so let's look at all of the various names that stem from the use of the term "XID" and try to come up with a good naming system for each. I invite the OpenVZ team, Eric and anyone else to put forward their names as well. Sorry if this is a bit long. Again I invite anyone at all to come forward with a preference or list another set of alternatives. Let's nail this one down, it comes up every time any patchset like this is put forward. Linux-VServer: CONFIG_VSERVER - config option typedef unsigned int xid_t; struct vx_info; task_struct->vx_info task_struct->xid vx_task_xid(struct task*) - get an XID from a task_struct vx_current_xid - get XID for current vx_info_state(struct vx_info*, VXS_FOO) - does vx_info have state create_vx_info - creates a new context and "hashes" it lookup_vx_info - lookup a vx_info by xid get_vx_info - increase refcount of a vx_info [...] release_vx_info - decrease the process count for a vx_info task_get_vx_info - like get_vx_info, but by process vx_migrate_task - join task to a vx_info vxlprintk - debugging printk (for CONFIG_VSERVER_DEBUG) vxh_alloc_vx_info - history tracing (for CONFIG_VSERVER_HISTORY) constants: VXS_FOO - state bits VXF_FOO - vserver flags (to select features) VXC_FOO - vserver-specific capabilities VCMD_get_version - vserver subcommand names VCI_VERSION - perhaps the legacy of this one should die. Using the term "vserver" and ID term "vsid": CONFIG_VSERVER - config option typedef unsigned int vsid_t; struct vserver task_struct->vserver task_struct->vsid vserver_task_vsid(struct task*) - get an VSID from a task_struct vserver_current_vsid - get VSID for current vserver_state(struct vserver*, VS_STATE_FOO) - does vserver hav... create_vserver - creates a new context and "hashes" it lookup_vserver - lookup a vserver by vsid get_vserver - increase refcount of a vserver [...] release_vserver - decrease the process count for a vserver task_get_vserver - like get_vserver, but by process vserver_migrate_task - join task to a vserver vserver_debug - debugging printk (for CONFIG_VSERVER_DEBUG) vserver_hist_alloc_vserver - history tracing (for CONFIG_VSERVER... constants: VS_STATE_FOO - state bits VS_FLAG_FOO - vserver flags (to select features) VS_CAP_FOO - vserver-specific capabilities VS_CMD_get_version - vserver subcommand names VS_VCI_VERSION Using the term "container" and ID term "cid": CONFIG_CONTAINERS - config option typedef unsigned int cid_t; struct container task_struct->container task_struct->cid container_task_cid(struct task*) - get an CID from a task_struct container_current_cid - get CID for current container_state(struct container*, CONTAINER_STATE_FOO) - does c... create_container - creates a new context and "hashes" it lookup_container - lookup a container by cid get_container - increase refcount of a container [...] release_container - decrease the process count for a container task_get_container - like get_container, but by process contain_task - join task to a container container_debug - debugging printk (for CONFIG_CONTAINER_DEBUG) container_hist_alloc_container - history tracing (for CONFIG_CON... constants: CONTAINER_STATE_FOO - state bits CONTAINER_FLAG_FOO - container flags (to select features) CONTAINER_CAP_FOO - container-specific capabilities CONTAINER_CMD_get_version - container subcommand names CONTAINER_VCI_VERSION Using the term "box" and ID term "boxid": CONFIG_BOXES - config option typedef unsigned int boxid_t; struct box task_struct->box task_struct->boxid box_task_boxid(struct task*) - get an BOXID from a task_struct box_current_boxid - get BOXID for current box_state(struct box*, BOX_STATE_FOO) - does box have state create_box - creates a new context and "hashes" it lookup_box - lookup a box by boxid get_box - increase refcount of a box [...] release_box - decrease the process count for a box task_get_box - like get_box, but by process box_migrate_task - join task to a box box_printk - debugging printk (for CONFIG_BOXES_DEBUG) box_hist_alloc_box - history tracing (for CONFIG_BOXES_HISTORY) constants: BOX_STATE_FOO - state bits BOX_FLAG_FOO - box flags (to select features) BOX_CAP_FOO - box-specific capabilities BOX_CMD_get_version - box subcommand names BOX_VCI_VERSION Using the term "family" and ID term "fid": CONFIG_FAMILY - config option typedef unsigned int fid_t; struct family task_struct->family task_struct->fid task_fid(struct task*) - get an FID from a task_struct family_current_fid - get FID for current family_state(struct family*, FAMILY_STATE_FOO) - does family hav... create_family - creates a new context and "hashes" it lookup_family - lookup a family by fid get_family - increase refcount of a family [...] release_family - decrease the process count for a family task_get_family - like get_family, but by process family_adopt_task - join task to a family family_printk - debugging printk (for CONFIG_FAMILY_DEBUG) family_hist_alloc_family - history tracing (for CONFIG_FAMILY_HI... constants: FAMILY_STATE_FOO - state bits FAMILY_FLAG_FOO - family flags (to select features) FAMILY_CAP_FOO - family-specific capabilities FAMILY_CMD_get_version - family subcommand names FAMILY_VCI_VERSION Using the term "task_ns" and ID term "nsid": CONFIG_TASK_NS - config option typedef unsigned int nsid_t; struct task_ns task_struct->task_ns task_struct->nsid task_nsid(struct task*) - get an NSID from a task_struct current_nsid - get NSID for current task_ns_state(struct task_ns*, TASK_NS_STATE_FOO) - does task_ns hav... create_task_ns - creates a new context and "hashes" it lookup_task_ns - lookup a task_ns by nsid get_task_ns - increase refcount of a task_ns [...] release_task_ns - decrease the process count for a task_ns task_get_task_ns - like get_task_ns, but by process task_ns_migrate_task - join task to a task_ns task_ns_printk - debugging printk (for CONFIG_TASK_NS_DEBUG) task_ns_hist_alloc_task_ns - history tracing (for CONFIG_TASK_NS_HI... constants: TASK_NS_STATE_FOO - state bits TASK_NS_FLAG_FOO - task_ns flags (to select features) TASK_NS_CAP_FOO - task_ns-specific capabilities TASK_NS_CMD_get_version - task_ns subcommand names TASK_NS_VCI_VERSION For the record, I like the term "process family" most. It implies the possibility strict grouping, like last name, as well as allowing heirarchies, but of course in our modern, post-nuclear family age, does not imply a fixed nature of anything ;-). Happy picking. Sam. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-21 23:12 ` Sam Vilain @ 2006-03-22 5:18 ` Sam Vilain 2006-03-22 7:13 ` Eric W. Biederman 1 sibling, 0 replies; 29+ messages in thread From: Sam Vilain @ 2006-03-22 5:18 UTC (permalink / raw) To: Sam Vilain Cc: Dave Hansen, linux-kernel, Herbert Poetzl, Eric W.Biederman, OpenVZ developers list, Serge E.Hallyn, Andrew Morton Sam Vilain wrote: >Using the term "task_ns" and ID term "nsid": > > CONFIG_TASK_NS - config option > typedef unsigned int nsid_t; > struct task_ns > task_struct->task_ns > task_struct->nsid > task_nsid(struct task*) - get an NSID from a task_struct > current_nsid - get NSID for current > task_ns_state(struct task_ns*, TASK_NS_STATE_FOO) - does task_ns hav... > create_task_ns - creates a new context and "hashes" it > lookup_task_ns - lookup a task_ns by nsid > get_task_ns - increase refcount of a task_ns > [...] > release_task_ns - decrease the process count for a task_ns > task_get_task_ns - like get_task_ns, but by process > task_ns_migrate_task - join task to a task_ns > task_ns_printk - debugging printk (for CONFIG_TASK_NS_DEBUG) > task_ns_hist_alloc_task_ns - history tracing (for CONFIG_TASK_NS_HI... > constants: > TASK_NS_STATE_FOO - state bits > TASK_NS_FLAG_FOO - task_ns flags (to select features) > TASK_NS_CAP_FOO - task_ns-specific capabilities > TASK_NS_CMD_get_version - task_ns subcommand names > TASK_NS_VCI_VERSION > > One more (apparently suggested by Eric Biederman, though perhaps he had different ideas about what it would look like) CONFIG_SPACE - config option typedef unsigned int space_t; struct space_info; task_struct->space task_struct->space_id task_space_id(struct task*) - get an SPACE_ID from a task_struct current_space_id - get SPACE_ID for current space_info_state(struct space_info*, TASK_SPACE_STATE_FOO) - does ... create_space - creates a new space and "hashes" it lookup_space - lookup a space_info by space_id get_space_info - increase refcount of a space_info put_space_info - decrease refcount of a space_info [...] grab_space - increase the process count for a space release_space - decrease the process count for a space task_get_space_info - like get_space_info, but by process space_migrate_task - join task to a space space_printk - debugging printk (for CONFIG_SPACE_DEBUG) space_hist_alloc_space - history tracing (for CONFIG_SPACE_HI... constants: SPACE_STATE_FOO - state bits SPACE_FLAG_FOO - task_ns flags (to select features) SPACE_CAP_FOO - task_ns-specific capabilities SPACE_CMD_get_version - task_ns subcommand names SPACE_SYSCALL_VERSION Something like that, anyway. I must admit "Task Spaces" sounds a little less dorky than "Task Namespaces", but doesn't roll off the tongue that well because of the '-sk s..' combination. Anyone? Sam. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-21 23:12 ` Sam Vilain 2006-03-22 5:18 ` Sam Vilain @ 2006-03-22 7:13 ` Eric W. Biederman 2006-03-23 4:17 ` Sam Vilain 2006-03-24 15:36 ` [Devel] " Kirill Korotaev 1 sibling, 2 replies; 29+ messages in thread From: Eric W. Biederman @ 2006-03-22 7:13 UTC (permalink / raw) To: Sam Vilain Cc: Dave Hansen, linux-kernel, Herbert Poetzl, OpenVZ developers list, Serge E.Hallyn, Andrew Morton Sam Vilain <sam@vilain.net> writes: > Dave Hansen wrote: > >>>This patch is simple, but does not handle SMP scalability very well >>>(you'll get a lot of cacheline problems when you start actually using >>>the container structure; the hashing helps a lot there) >>> >>> >>Could you elaborate a bit on this one? What has cacheline problems? >> >> > > OK, on reflection this probably doesn't belong this early in the > series. It only helps speed up lookup by IDs which is only an SMP > speed enhancement if you expect a lot of those (eg, when storing the > XIDs on another subsystem, such as a filesystem) and unusual things like > creating/destroying a lot of vservers quickly. I'll pull it out. > > I tried to make it as basic as possible, but my basic problem with the > patch you linked to is that it shouldn't be so small that you don't get > a decent bunch of the internal API functions listed in description of > part 1 of the set > (http://vserver.utsl.gen.nz/patches/utsl/2.6.16-rc4-vsi/01-VServer-Umbrella.diff) Right. I think the smallest thing that we can reasonably discuss with real problems is the sysvipc namespace. This is why I suggested efforts in that direction. >>>and does not provide functions such as looking up a container by ID etc. >>> >>> >> >>We need something so simple that we probably don't even deal with ids. >>I believe that Eric claims that we don't really need container _ids_. >>For instance, the filesystem namespaces have no ids, and work just fine. >> >> > > For actual namespace structures themselves that are virtualising real > things, I agree entirely. > > But I think the whole point of this patchset (which sadly vger ate for > the wider audience, and I still don't know why) is to group tasks and to > give userland, and hence the administrator, something tangible with > which to group processes with and tack namespace structures onto. I can > split out the ID related stuff into another patch, but it would need to > go back in before a useful syscall interface can be added. I'll > consider it anyway, though. So as best I can determine that is simply an implementation optimization. (That is with a little refactoring we can add a structure like that later if the performance concerns warrant it.) Skipping that optimization we should be able to concentrate on the fundamentals. Which should be simpler and allow better forward progress. > Note that a given vx_info structure still might share namespace > structures with other vx_info objects - this is what I was alluding to > with the "we haven't made any restrictions on the nature of > virtualisation yet" comment. All we've made is the (XID, PID) tuple > (although the default for XID=0 is that all processes are visible in one > big PID space). The intention is that flags will control how you want > your PIDs to work for that vserver. Do we need flags or can we move this to user space? > The only thing that they can't do is share processes - it is a one to > many relationship. However, if there can be a parent/child > containership relationship on the XIDs themselves, you can still achieve > the behaviour of these unusual situations. I'm working on the > assumption that if we ever have to migrate trees of XIDs then we can > virtualise how they look inside a vserver using flags. > > Eric, perhaps you can comment or refer to the earlier post where you > made this argument. I tried to follow it but perhaps I missed the jist > of one of the messages or one of the most important messages entirely. I'm not certain which argument you are referring to but I will state my general argument against adding additional global ids. Ultimately you want to nest you vservers and the like inside of each other, because if you haven't captured everything you can do in user space there will be some applications that don't work, and ultimately it is desirable to support all applications. When doing that you want to use the same mechanism on the inside as you have on the outside. Additional global identifiers make this very difficult. You can always talk about resources by specifying the processes that use them. It is one level of indirection, but it works and is simple. >>By starting painfully simply, we can build on complexity in bits, and >>justify it as we go. >> > This is my aim too! I'll keep chopping and changing. > >>>And also because the acronym "vx" makes the API look nice, at least to >>>mine and Herbert's eyes, then when you go to the network virtualisation >>>you get "nx_info", etc. However I'm thinking any of these terms might >>>also be right: >>> >>> - "vserver" spelt in full >>> - family >>> - container >>> - jail >>> - task_ns (sort for namespace) >>> >>>Perhaps we can get a ruling from core team on this one, as it's >>>aesthetics :-). >>> >>> >> >>I was in a meeting with a few coworkers, and we were arguing a bit about >>naming. One person there was a manager-type who didn't have any direct >>involvement in the project. We asked him which naming was more clear. >> >>We need to think a bit like that. What is more clear to somebody who >>has never read the code? (Hint "vx_" means nothing. :) >> >> > > OK, so let's look at all of the various names that stem from the use of > the term "XID" and try to come up with a good naming system for each. I > invite the OpenVZ team, Eric and anyone else to put forward their names > as well. > > Sorry if this is a bit long. Again I invite anyone at all to come > forward with a preference or list another set of alternatives. Let's > nail this one down, it comes up every time any patchset like this is put > forward. > For the record, I like the term "process family" most. It implies the > possibility strict grouping, like last name, as well as allowing > heirarchies, but of course in our modern, post-nuclear family age, does > not imply a fixed nature of anything ;-). Families don't sound bad, but this all feels like putting the cart before the horse. It is infrastructure solving a problem that I am not at all certain is interesting. Eric ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-22 7:13 ` Eric W. Biederman @ 2006-03-23 4:17 ` Sam Vilain 2006-03-24 15:36 ` [Devel] " Kirill Korotaev 1 sibling, 0 replies; 29+ messages in thread From: Sam Vilain @ 2006-03-23 4:17 UTC (permalink / raw) To: Eric W. Biederman Cc: Dave Hansen, linux-kernel, Herbert Poetzl, OpenVZ developers list, Serge E.Hallyn, Andrew Morton Eric W. Biederman wrote: >>I tried to make it as basic as possible, but my basic problem with the >>patch you linked to is that it shouldn't be so small that you don't get >>a decent bunch of the internal API functions listed in description of >>part 1 of the set >>(http://vserver.utsl.gen.nz/patches/utsl/2.6.16-rc4-vsi/01-VServer-Umbrella.diff) >> >> > >Right. I think the smallest thing that we can reasonably discuss with >real problems is the sysvipc namespace. This is why I suggested efforts >in that direction. > > I apologise for not contributing to that effort, however I was expecting to be able to plug the deliverables of it into this framework. I hope to participate in as many of these subsystem virtualisation discussions as I can, but at that time it looked like there was enough input going in. >>But I think the whole point of this patchset (which sadly vger ate for >>the wider audience, and I still don't know why) is to group tasks and to >>give userland, and hence the administrator, something tangible with >>which to group processes with and tack namespace structures onto. I can >>split out the ID related stuff into another patch, but it would need to >>go back in before a useful syscall interface can be added. I'll >>consider it anyway, though. >> >> >So as best I can determine that is simply an implementation optimization. >(That is with a little refactoring we can add a structure like that later > if the performance concerns warrant it.) > >Skipping that optimization we should be able to concentrate on the >fundamentals. Which should be simpler and allow better forward progress. > > Sure, this is where we came to last time. I agree that it is an optimisation, but not just an implementation optimisation. In the abstract sense you have tuples of (task, namespace) for each type of namespace. However for (task, utsname) this is a much weaker thing to want per-process, and much more like you want it per process family. So, it is also a design optimisation. Some tuples really are (family, namespace) in the first instance. >>Note that a given vx_info structure still might share namespace >>structures with other vx_info objects - this is what I was alluding to >>with the "we haven't made any restrictions on the nature of >>virtualisation yet" comment. All we've made is the (XID, PID) tuple >>(although the default for XID=0 is that all processes are visible in one >>big PID space). The intention is that flags will control how you want >>your PIDs to work for that vserver. >> >> >Do we need flags or can we move this to user space? > > Some of them influence behaviour in fastpath code. For instance, one example of a flag is the scheduling policy. A process family might have a flag set on it to group its scheduling together, to provide assertions like "this family should get ¼ of the CPU". Clearly userspace can't make this call. But perhaps I don't get your meaning. I think they are a pragmatic necessity. >>The only thing that they can't do is share processes - it is a one to >>many relationship. However, if there can be a parent/child >>containership relationship on the XIDs themselves, you can still achieve >>the behaviour of these unusual situations. I'm working on the >>assumption that if we ever have to migrate trees of XIDs then we can >>virtualise how they look inside a vserver using flags. >> >>Eric, perhaps you can comment or refer to the earlier post where you >>made this argument. I tried to follow it but perhaps I missed the jist >>of one of the messages or one of the most important messages entirely. >> >> > >I'm not certain which argument you are referring to but I will state my general >argument against adding additional global ids. > >Ultimately you want to nest you vservers and the like inside of each >other, because if you haven't captured everything you can do in user >space there will be some applications that don't work, and ultimately >it is desirable to support all applications. When doing that you want >to use the same mechanism on the inside as you have on the outside. >Additional global identifiers make this very difficult. > > Sure, you might set a flag on a process family to allow family_id values to be rewritten on the system/user barrier. >You can always talk about resources by specifying the processes that >use them. It is one level of indirection, but it works and is simple. > > Sometimes you might not know of any particular processes in the family. Maybe it's got one process that is just doing "while (1) { fork && exit }". How do you "catch" it ? >>For the record, I like the term "process family" most. It implies the >>possibility strict grouping, like last name, as well as allowing >>heirarchies, but of course in our modern, post-nuclear family age, does >>not imply a fixed nature of anything ;-). >> >> > >Families don't sound bad, but this all feels like putting the cart >before the horse. It is infrastructure solving a problem that I am >not at all certain is interesting. > > So, you're breeding and selecting the horse. Fine, can't I start nailing together the frame for the cart, and tanning the leather for the bridle? Surely whichever horse we end up with, we'll still need a cart. And don't forget the stables, too. We've got a lot of stables already, so we know roughly what size the cart needs to be. Surely we won't want people knocking those down and rebuilding them. I'm just a bit concerned about this merging effort turning into a research project. I'd rather try and merge the approaches used by systems people are currently using than try to come up with something wildly new. After all, if the new developments are that flexible, they should be easily able to support the presented interfaces ranging from the circa 1999 FreeBSD jail() to a more modern vserver/container/openvz/etc API. Sam. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Devel] Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-22 7:13 ` Eric W. Biederman 2006-03-23 4:17 ` Sam Vilain @ 2006-03-24 15:36 ` Kirill Korotaev 2006-03-27 12:45 ` Serge E. Hallyn 1 sibling, 1 reply; 29+ messages in thread From: Kirill Korotaev @ 2006-03-24 15:36 UTC (permalink / raw) To: devel Cc: Sam Vilain, Andrew Morton, linux-kernel, Herbert Poetzl, Serge E.Hallyn, Mishin Dmitry, Alexey Kuznetsov >> I tried to make it as basic as possible, but my basic problem with the >> patch you linked to is that it shouldn't be so small that you don't get >> a decent bunch of the internal API functions listed in description of >> part 1 of the set >> (http://vserver.utsl.gen.nz/patches/utsl/2.6.16-rc4-vsi/01-VServer-Umbrella.diff) > > Right. I think the smallest thing that we can reasonably discuss with > real problems is the sysvipc namespace. This is why I suggested efforts > in that direction. ok. I send to all on CC our IPC and utsname namespace patch. We implemented them as separate namespaces (to please Eric), compile time configurable (to please embedded people) and using "current" namespace context (as we do in OpenVZ). I suppose all these points can be disucssed. In many respects the patches look like a bit changed Eric/DHansen/OpenVZ. I suppose this should be discussed/worked out and commited to Linus/Andrew as there are any serious issues here. >>> We need something so simple that we probably don't even deal with ids. >>> I believe that Eric claims that we don't really need container _ids_. >>> For instance, the filesystem namespaces have no ids, and work just fine. Eric, the namespaces itself can be without any IDs, really. But on top of that projects like OpenVZ/vserver can implement VPSs, umbreallas etc. and will introduce IDs and interfaces they are accustomed of. > Ultimately you want to nest you vservers and the like inside of each > other, because if you haven't captured everything you can do in user > space there will be some applications that don't work, and ultimately > it is desirable to support all applications. When doing that you want > to use the same mechanism on the inside as you have on the outside. > Additional global identifiers make this very difficult. > > You can always talk about resources by specifying the processes that > use them. It is one level of indirection, but it works and is simple. I think it is not that easy as you wish to think of it. Consider 2 processes: A and B. each of them has its own fs namespace, but common network namespace. process A passes file descriptor fd to process B via unix socket. So the same file belongs to 2 different fs namespaces. Will you simply forbid fd passing to another fd namespace? I see 2 possibilities here. 1. You either introduce fully isolated resource namespaces, than it is flat, not nested. 2. Or you need to state correctly what "nesting" means from your point of view and which operations parent namespace can do with its children. Last time I had a talk with Herbert we ended up that such a nesting is nothing more than a privileges delegation, e.g. allows you to manage your children. Eric, can your describe what you want from nesting on some particular namespace, for example fs, ipc, networking, ...? Not by word "nesting", but by concrete operations which parent can do and why a child looks "nested". Just to make it more clear: my understanding of word "nested" means that if you have, for example, a nested IPC namespace, than parent can see all the resources (sems, shms, ...) of it's children and have some private, while children see only its own set of private resources. But it doesn't look like you are going to implement anything like this. So what is nesting then? Ability to create namespace? To delegate it some part of own resource limits? >>>> And also because the acronym "vx" makes the API look nice, at least to >>>> mine and Herbert's eyes, then when you go to the network virtualisation >>>> you get "nx_info", etc. However I'm thinking any of these terms might >>>> also be right: >>>> >>>> - "vserver" spelt in full >>>> - family >>>> - container >>>> - jail >>>> - task_ns (sort for namespace) >>>> >>>> Perhaps we can get a ruling from core team on this one, as it's >>>> aesthetics :-). I propose to use "namespace" naming. 1. This is already used in fs. 2. This is what IMHO suites at least OpenVZ/Eric 3. it has good acronym "ns". >> For the record, I like the term "process family" most. It implies the >> possibility strict grouping, like last name, as well as allowing >> heirarchies, but of course in our modern, post-nuclear family age, does >> not imply a fixed nature of anything ;-). > > Families don't sound bad, but this all feels like putting the cart > before the horse. It is infrastructure solving a problem that I am > not at all certain is interesting. agreed. Thanks, Kirill ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Devel] Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-24 15:36 ` [Devel] " Kirill Korotaev @ 2006-03-27 12:45 ` Serge E. Hallyn 2006-03-28 5:28 ` Sam Vilain 2006-03-29 12:07 ` Kirill Korotaev 0 siblings, 2 replies; 29+ messages in thread From: Serge E. Hallyn @ 2006-03-27 12:45 UTC (permalink / raw) To: Kirill Korotaev Cc: devel, Sam Vilain, Andrew Morton, linux-kernel, Herbert Poetzl, Serge E.Hallyn, Mishin Dmitry, Alexey Kuznetsov Quoting Kirill Korotaev (dev@sw.ru): > Just to make it more clear: my understanding of word "nested" means that > if you have, for example, a nested IPC namespace, than parent can see > all the resources (sems, shms, ...) of it's children and have some > private, while children see only its own set of private resources. But > it doesn't look like you are going to implement anything like this. > So what is nesting then? Ability to create namespace? To delegate it > some part of own resource limits? Nesting simply means that any child ns can create child namespaces of it's own. In particular, the following scenario should be perfectly valid: Machine 1 Machine 2 Xen VM1.1 Xen VM2.1 vserv 1.1.1 vserv2.1.1 cont1.1.1.1 cont2.1.1.1 cont1.1.1.2 cont2.1.1.2 cont1.1.1.n cont2.1.1.n vserv 1.1.2 vserv2.1.2 cont1.1.2.1 cont2.1.2.1 cont1.1.2.2 cont2.1.2.2 cont1.1.2.n cont2.1.2.n Xen VM1.2 Xen VM2.2 vserv 1.2.1 vserv2.2.1 cont1.2.1.1 cont2.2.1.1 cont1.2.1.2 cont2.2.1.2 cont1.2.1.n cont2.2.1.n vserv 1.2.2 vserv2.2.2 cont1.2.2.1 cont2.2.2.1 cont1.2.2.2 cont2.2.2.2 cont1.2.2.n cont2.2.2.n where containers are used for each virtual server and each container, so that we can migrate entire VMs, entire virtual servers, or any container. > >>>>Perhaps we can get a ruling from core team on this one, as it's > >>>>aesthetics :-). > I propose to use "namespace" naming. > 1. This is already used in fs. > 2. This is what IMHO suites at least OpenVZ/Eric > 3. it has good acronym "ns". I agree. -serge ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Devel] Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-27 12:45 ` Serge E. Hallyn @ 2006-03-28 5:28 ` Sam Vilain 2006-03-29 12:07 ` Kirill Korotaev 1 sibling, 0 replies; 29+ messages in thread From: Sam Vilain @ 2006-03-28 5:28 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: linux-kernel On Mon, 2006-03-27 at 06:45 -0600, Serge E. Hallyn wrote: > In particular, the following scenario should be perfectly valid: > > Machine 1 Machine 2 > Xen VM1.1 Xen VM2.1 > vserv 1.1.1 vserv2.1.1 > cont1.1.1.1 cont2.1.1.1 Precisely ... Xen and vserver are complementary, not contradictory. Sam. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Devel] Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-27 12:45 ` Serge E. Hallyn 2006-03-28 5:28 ` Sam Vilain @ 2006-03-29 12:07 ` Kirill Korotaev 2006-03-29 13:47 ` Serge E. Hallyn 1 sibling, 1 reply; 29+ messages in thread From: Kirill Korotaev @ 2006-03-29 12:07 UTC (permalink / raw) To: Serge E. Hallyn Cc: devel, Sam Vilain, Andrew Morton, linux-kernel, Herbert Poetzl, Mishin Dmitry, Alexey Kuznetsov Serge, Serge E. Hallyn wrote: > Quoting Kirill Korotaev (dev@sw.ru): >> Just to make it more clear: my understanding of word "nested" means that >> if you have, for example, a nested IPC namespace, than parent can see >> all the resources (sems, shms, ...) of it's children and have some >> private, while children see only its own set of private resources. But >> it doesn't look like you are going to implement anything like this. >> So what is nesting then? Ability to create namespace? To delegate it >> some part of own resource limits? > > Nesting simply means that any child ns can create child namespaces of > it's own. your picture below doesn't show that containers have nested containers. You draw a plain container set inside vserv. What I mean is that if some container user can create another container, it DOES not mean it is nested. It is just about permitions to create other containers. Nested containers in my POV is something different, when you can see the resources of your container and your children. You see? I will try to show what I mean on a picture: -------------------------------------------------- | --------------------------------- | | | --------------- | | | | | cont 1.1.1 | | | | | | shm1.1.1.1 | | | | | | shm1.1.1.2 | | | | cont 1. | cont 1.1 --------------- | | | shm1.1 | shm1.1.1 --------------- | | | shm1.2 | | cont 1.1.2 | | | | | | shm1.1.2.1 | | | | | --------------- | | | --------------------------------- | |-------------------------------------------------- You see what I mean? In this example with IPC sharememory container 1 can see all the shm segments. while container1.1.2 can see only his private one smm1.1.2.1. And if resources are not nested like this, than it is a PLAIN container structure. Kirill > In particular, the following scenario should be perfectly valid: > > Machine 1 Machine 2 > Xen VM1.1 Xen VM2.1 > vserv 1.1.1 vserv2.1.1 > cont1.1.1.1 cont2.1.1.1 > cont1.1.1.2 cont2.1.1.2 > cont1.1.1.n cont2.1.1.n > vserv 1.1.2 vserv2.1.2 > cont1.1.2.1 cont2.1.2.1 > cont1.1.2.2 cont2.1.2.2 > cont1.1.2.n cont2.1.2.n > Xen VM1.2 Xen VM2.2 > vserv 1.2.1 vserv2.2.1 > cont1.2.1.1 cont2.2.1.1 > cont1.2.1.2 cont2.2.1.2 > cont1.2.1.n cont2.2.1.n > vserv 1.2.2 vserv2.2.2 > cont1.2.2.1 cont2.2.2.1 > cont1.2.2.2 cont2.2.2.2 > cont1.2.2.n cont2.2.2.n > > where containers are used for each virtual server and each container, > so that we can migrate entire VMs, entire virtual servers, or any > container. > >>>>>> Perhaps we can get a ruling from core team on this one, as it's >>>>>> aesthetics :-). >> I propose to use "namespace" naming. >> 1. This is already used in fs. >> 2. This is what IMHO suites at least OpenVZ/Eric >> 3. it has good acronym "ns". > > I agree. > > -serge > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Devel] Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-29 12:07 ` Kirill Korotaev @ 2006-03-29 13:47 ` Serge E. Hallyn 2006-03-29 21:30 ` Sam Vilain 0 siblings, 1 reply; 29+ messages in thread From: Serge E. Hallyn @ 2006-03-29 13:47 UTC (permalink / raw) To: Kirill Korotaev Cc: Serge E. Hallyn, devel, Sam Vilain, Andrew Morton, linux-kernel, Herbert Poetzl, Mishin Dmitry, Alexey Kuznetsov Quoting Kirill Korotaev (dev@sw.ru): > Serge, > > Serge E. Hallyn wrote: > >Quoting Kirill Korotaev (dev@sw.ru): > >>Just to make it more clear: my understanding of word "nested" means that > >>if you have, for example, a nested IPC namespace, than parent can see > >>all the resources (sems, shms, ...) of it's children and have some > >>private, while children see only its own set of private resources. But > >>it doesn't look like you are going to implement anything like this. > >>So what is nesting then? Ability to create namespace? To delegate it > >>some part of own resource limits? > > > >Nesting simply means that any child ns can create child namespaces of > >it's own. > your picture below doesn't show that containers have nested containers. > You draw a plain container set inside vserv. And I am assuming that vserv is implemented as a container, hence this is an example of nested containers very likely to be used. But given what I now think is your definition of nested, I think we are agreed. > What I mean is that if some container user can create another container, > it DOES not mean it is nested. It is just about permitions to create > other containers. Nested containers in my POV is something different, > when you can see the resources of your container and your children. You see? Alas, the spacing on the picture didn't quite work out :) I think that by nested containers, you mean overlapping nested containers. In your example, how are you suggesting that cont1 refers to items in container1.1.2's shmem? I assume, given your previous posts on openvz, that you want every shmem id in all namespaces "nested" under cont1 to be unique, and for cont1 to refer to any item in container1.1.2's namespace just as it would any of cont1's own shmem? In that case I am not sure of the actual usefulness. Someone with different use for containers (you? :) will need to justify it. For me, pure isolation works just fine. Clearly it will be most useful if we want fine-grained administration, from parent namespaces, of the items in a child namespace. -serge > I will try to show what I mean on a picture: > > -------------------------------------------------- > | --------------------------------- | > | | --------------- > | | | | > | cont 1.1.1 | | | > | | | shm1.1.1.1 | | | > | | | > shm1.1.1.2 | | | | > cont 1. | cont 1.1 --------------- | | > | shm1.1 | shm1.1.1 --------------- | | > | shm1.2 | | cont 1.1.2 | > | | | | > | shm1.1.2.1 | | | > | | --------------- | | > | > --------------------------------- | > |-------------------------------------------------- > > You see what I mean? In this example with IPC sharememory container 1 > can see all the shm segments. while container1.1.2 can see only his > private one smm1.1.2.1. > > And if resources are not nested like this, than it is a PLAIN container > structure. > > Kirill > > >In particular, the following scenario should be perfectly valid: > > > > Machine 1 Machine 2 > > Xen VM1.1 Xen VM2.1 > > vserv 1.1.1 vserv2.1.1 > > cont1.1.1.1 cont2.1.1.1 > > cont1.1.1.2 cont2.1.1.2 > > cont1.1.1.n cont2.1.1.n > > vserv 1.1.2 vserv2.1.2 > > cont1.1.2.1 cont2.1.2.1 > > cont1.1.2.2 cont2.1.2.2 > > cont1.1.2.n cont2.1.2.n > > Xen VM1.2 Xen VM2.2 > > vserv 1.2.1 vserv2.2.1 > > cont1.2.1.1 cont2.2.1.1 > > cont1.2.1.2 cont2.2.1.2 > > cont1.2.1.n cont2.2.1.n > > vserv 1.2.2 vserv2.2.2 > > cont1.2.2.1 cont2.2.2.1 > > cont1.2.2.2 cont2.2.2.2 > > cont1.2.2.n cont2.2.2.n > > > >where containers are used for each virtual server and each container, > >so that we can migrate entire VMs, entire virtual servers, or any > >container. > > > >>>>>>Perhaps we can get a ruling from core team on this one, as it's > >>>>>>aesthetics :-). > >>I propose to use "namespace" naming. > >>1. This is already used in fs. > >>2. This is what IMHO suites at least OpenVZ/Eric > >>3. it has good acronym "ns". > > > >I agree. > > > >-serge > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Devel] Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-29 13:47 ` Serge E. Hallyn @ 2006-03-29 21:30 ` Sam Vilain 2006-04-19 7:50 ` Eric W. Biederman 0 siblings, 1 reply; 29+ messages in thread From: Sam Vilain @ 2006-03-29 21:30 UTC (permalink / raw) To: Serge E. Hallyn Cc: Kirill Korotaev, devel, Andrew Morton, linux-kernel, Herbert Poetzl, Mishin Dmitry, Alexey Kuznetsov On Wed, 2006-03-29 at 07:47 -0600, Serge E. Hallyn wrote: > Alas, the spacing on the picture didn't quite work out :) I think that > by nested containers, you mean overlapping nested containers. In your > example, how are you suggesting that cont1 refers to items in > container1.1.2's shmem? I assume, given your previous posts on openvz, > that you want every shmem id in all namespaces "nested" under cont1 to > be unique, and for cont1 to refer to any item in container1.1.2's > namespace just as it would any of cont1's own shmem? > > In that case I am not sure of the actual usefulness. Someone with > different use for containers (you? :) will need to justify it. For me, > pure isolation works just fine. Clearly it will be most useful if we > want fine-grained administration, from parent namespaces, of the items > in a child namespace. The overlapping is important if you want to pretend that the namespace-able resources are allowed to be specified per-process, when really they are specified per-family. In this way, a process family is merely a grouping of processes with like namespaces, and depending on which way they overlap you get the same behaviour as when processes only have one resource different, and therefore remove the overhead on fork(). Sam. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Devel] Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-29 21:30 ` Sam Vilain @ 2006-04-19 7:50 ` Eric W. Biederman 2006-04-19 21:42 ` Sam Vilain 0 siblings, 1 reply; 29+ messages in thread From: Eric W. Biederman @ 2006-04-19 7:50 UTC (permalink / raw) To: Sam Vilain Cc: Serge E. Hallyn, Kirill Korotaev, devel, Andrew Morton, linux-kernel, Herbert Poetzl, Mishin Dmitry, Alexey Kuznetsov Sam Vilain <sam@vilain.net> writes: > On Wed, 2006-03-29 at 07:47 -0600, Serge E. Hallyn wrote: >> Alas, the spacing on the picture didn't quite work out :) I think that >> by nested containers, you mean overlapping nested containers. In your >> example, how are you suggesting that cont1 refers to items in >> container1.1.2's shmem? I assume, given your previous posts on openvz, >> that you want every shmem id in all namespaces "nested" under cont1 to >> be unique, and for cont1 to refer to any item in container1.1.2's >> namespace just as it would any of cont1's own shmem? >> >> In that case I am not sure of the actual usefulness. Someone with >> different use for containers (you? :) will need to justify it. For me, >> pure isolation works just fine. Clearly it will be most useful if we >> want fine-grained administration, from parent namespaces, of the items >> in a child namespace. > > The overlapping is important if you want to pretend that the > namespace-able resources are allowed to be specified per-process, when > really they are specified per-family. > > In this way, a process family is merely a grouping of processes with > like namespaces, and depending on which way they overlap you get the > same behaviour as when processes only have one resource different, and > therefore remove the overhead on fork(). I missed this subthread originally. I think it is important that we can have containers in containers if at all possible. This means large software collections can count on them being present. As for having some items inside a namespace show up in both a parent and a child namespace I think the case is less clearly defined. If possible that is something we want to avoid as it complicates the implementation. For pids I will be surprised if we can avoid it. For most other namespaces I think we can, and it is a good thing to avoid. Eric ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Devel] Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-04-19 7:50 ` Eric W. Biederman @ 2006-04-19 21:42 ` Sam Vilain 0 siblings, 0 replies; 29+ messages in thread From: Sam Vilain @ 2006-04-19 21:42 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge E. Hallyn, Kirill Korotaev, devel, Andrew Morton, linux-kernel, Herbert Poetzl, Mishin Dmitry, Alexey Kuznetsov Eric W. Biederman wrote: >>The overlapping is important if you want to pretend that the >>namespace-able resources are allowed to be specified per-process, when >>really they are specified per-family. >> >>In this way, a process family is merely a grouping of processes with >>like namespaces, and depending on which way they overlap you get the >>same behaviour as when processes only have one resource different, and >>therefore remove the overhead on fork(). >> >> > > >I missed this subthread originally. > >I think it is important that we can have containers in containers >if at all possible. This means large software collections can count >on them being present. > > Right. Well, my concept was that that "in"-ness is just a relationship between two process families, so treat it relationally and not heirarchically. So, to the kernel, they've all got global unique IDs, but to the actual userspace within those families, they might see something different. And the model still supports containers in containers. >As for having some items inside a namespace show up in both >a parent and a child namespace I think the case is less clearly >defined. If possible that is something we want to avoid as it >complicates the implementation. > >For pids I will be surprised if we can avoid it. > >For most other namespaces I think we can, and it is a good thing >to avoid. > > Well, let's cross those bridges when we come to them. I agree it should only be implemented if required for individual numbers. PIDs and process family IDs seem logical to me, at this point anyway. Sam. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-21 18:50 ` [RFC] [PATCH 0/7] Some basic vserver infrastructure Dave Hansen 2006-03-21 21:08 ` Sam Vilain @ 2006-03-22 6:41 ` Eric W. Biederman 2006-03-23 4:29 ` Sam Vilain 2006-03-24 15:37 ` Kirill Korotaev 1 sibling, 2 replies; 29+ messages in thread From: Eric W. Biederman @ 2006-03-22 6:41 UTC (permalink / raw) To: Dave Hansen Cc: Sam Vilain, linux-kernel, Herbert Poetzl, OpenVZ developers list, Serge E.Hallyn, Andrew Morton Dave Hansen <haveblue@us.ibm.com> writes: > On Tue, 2006-03-21 at 18:13 +1200, Sam Vilain wrote: >> Here is a work in progress of trying to extract some the core vserver >> architecture and present it as an incremental set of patches. > > Hi Sam, > > These patches are certainly getting better and better broken out all the > time. Nice work. > > But, I worry that they just aren't generic enough yet. I don't see any > response from any of the other "container/namespace/vps" people. I fear > that this means that they don't look broadly useful enough, yet. Not broadly useful is certainly my impression. It feels to me like these patches are simply doing too much. > That said, at this point, I'd just about rather have _anything_ merged > than the nothing we have at this point. As we throw patches back and > forth, we can't seem to agree on even some very small points. > > I also have a sinking feeling that everybody has gone back off and > continues to develop their own out-of-tree functionality, deepening the > patch divide. I certainly have not. I do feel that developing this just from the top down is the wrong way to do this. In some of the preliminary patches we have found several pieces of code that we will have to touch that is currently in need of a cleanup. That is why I have been cleaning up /proc. sysctl is in need of similar treatment but is in less bad shape. Part of it is that I have stopped to look more closely at what other people are doing and to look at alternative implementations. One interesting thing I have manged to do is by using ptrace I have implemented enter for the existing filesystem namespaces without having to modify the kernel. This at least says that enter and debugging are two faces of the same coin. > Is there anything we could merge that we _all_ don't like? I'm pretty > convinced that no single solution will support Eric's, OpenVZ's, and > VServer's _existing_ usage models. Somebody is going to have to bend, > or nothing will ever get merged. Any volunteers? ;) I don't think that is the case on the fundamentals. I think with pids I am an inch away from implementing a pid namespace that is both recursive, efficient, and can map all of the pids into another pid space if that is desirable. Plus I can merge most of it incrementally in the existing kernel, before I even allow for multiple pid spaces. Which should reduce the patch for multiple pid namespaces to something reasonable to talk about. > What about going back to the very simple "struct container" on which to > build? I guess my problem there is that isn't something on which to build that is something to hang things off of. Eric ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-22 6:41 ` Eric W. Biederman @ 2006-03-23 4:29 ` Sam Vilain 2006-03-23 4:50 ` Andrew Morton 2006-03-24 15:38 ` Kirill Korotaev 2006-03-24 15:37 ` Kirill Korotaev 1 sibling, 2 replies; 29+ messages in thread From: Sam Vilain @ 2006-03-23 4:29 UTC (permalink / raw) To: Eric W. Biederman Cc: Dave Hansen, linux-kernel, Herbert Poetzl, OpenVZ developers list, Serge E.Hallyn, Andrew Morton Eric W. Biederman wrote: >I certainly have not. I do feel that developing this just from the >top down is the wrong way to do this. > > OK, you said "just" the top down. That's fine, I can agree with that. Obviously if any of the implementation of the top down features hits ugliness that needs refactoring, that refactoring has to go first. >In some of the preliminary >patches we have found several pieces of code that we will have to >touch that is currently in need of a cleanup. That is why I have >been cleaning up /proc. sysctl is in need of similar treatment >but is in less bad shape. > > I made a preliminary attempt to rebase the /proc hooks atop of your work. I looked forward to being ready for if a patchset like this got adopted to -mm to be able to hand that piece over :-). The reason I didn't persue that to completion was I wasn't sure how I would then submit it - relative to the -mm tree? I didn't want to include your patches in my series. It would be nice if someone could make a git branch on kernel.org for each -mm release so it can be more easily imported to a git tree. Sure, `stg import' might take a while for all 1,400 patches, but that's OK - so long as Andrew's not waiting for it :-). >I don't think that is the case on the fundamentals. I think with pids >I am an inch away from implementing a pid namespace that is both >recursive, efficient, and can map all of the pids into another pid >space if that is desirable. Plus I can merge most of it incrementally >in the existing kernel, before I even allow for multiple pid spaces. > >Which should reduce the patch for multiple pid namespaces to something >reasonable to talk about. > > Well I see pids as just another virtualisable entity, but OK... if you come up with anything I'm happy to rebase atop it. >>What about going back to the very simple "struct container" on which to >>build? >> >> > >I guess my problem there is that isn't something on which to build >that is something to hang things off of. > > Yes, hanging things off is the intention, but they are both starting points, just from different perspectives. Sam. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-23 4:29 ` Sam Vilain @ 2006-03-23 4:50 ` Andrew Morton 2006-03-24 15:38 ` Kirill Korotaev 1 sibling, 0 replies; 29+ messages in thread From: Andrew Morton @ 2006-03-23 4:50 UTC (permalink / raw) To: Sam Vilain; +Cc: ebiederm, haveblue, linux-kernel, herbert, dev, serue Sam Vilain <sam@vilain.net> wrote: > > It would be nice if someone could make a git branch on kernel.org for > each -mm release so it can be more easily imported to a git tree. http://www.kernel.org/git/gitweb.cgi?p=linux/kernel/git/smurf/linux-trees.git;a=summary ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-23 4:29 ` Sam Vilain 2006-03-23 4:50 ` Andrew Morton @ 2006-03-24 15:38 ` Kirill Korotaev 1 sibling, 0 replies; 29+ messages in thread From: Kirill Korotaev @ 2006-03-24 15:38 UTC (permalink / raw) To: Sam Vilain Cc: Eric W. Biederman, Dave Hansen, linux-kernel, Herbert Poetzl, OpenVZ developers list, Serge E.Hallyn, Andrew Morton > I made a preliminary attempt to rebase the /proc hooks atop of your > work. I looked forward to being ready for if a patchset like this got > adopted to -mm to be able to hand that piece over :-). I strongly object against using /proc hooks to get virtualization-like solution. You endup with lots of hooks and unmaintainable code. Also though it works find with a small subset of /proc and sysctl, it works poorly with dynamic trees, as the same entries can be created in container, though they already exist in host. e.g. network device names. Thanks, Kirill ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-22 6:41 ` Eric W. Biederman 2006-03-23 4:29 ` Sam Vilain @ 2006-03-24 15:37 ` Kirill Korotaev 2006-03-24 20:28 ` Eric W. Biederman 1 sibling, 1 reply; 29+ messages in thread From: Kirill Korotaev @ 2006-03-24 15:37 UTC (permalink / raw) To: Eric W. Biederman Cc: Dave Hansen, Sam Vilain, linux-kernel, Herbert Poetzl, OpenVZ developers list, Serge E.Hallyn, Andrew Morton Hello, >> But, I worry that they just aren't generic enough yet. I don't see any >> response from any of the other "container/namespace/vps" people. I fear >> that this means that they don't look broadly useful enough, yet. > > Not broadly useful is certainly my impression. > It feels to me like these patches are simply doing too much. Exactly! These patches are really too big and can't be called clean... naming, bunch of debug etc. I can post the same amount of OpenVZ stuf and Herbert will claim his code is more clear after that. Also these patches contradict to what was discussed before: different name spaces for each subsystem. >> That said, at this point, I'd just about rather have _anything_ merged >> than the nothing we have at this point. As we throw patches back and >> forth, we can't seem to agree on even some very small points. >> >> I also have a sinking feeling that everybody has gone back off and >> continues to develop their own out-of-tree functionality, deepening the >> patch divide. > I certainly have not. I do feel that developing this just from the > top down is the wrong way to do this. In some of the preliminary > patches we have found several pieces of code that we will have to > touch that is currently in need of a cleanup. That is why I have > been cleaning up /proc. sysctl is in need of similar treatment > but is in less bad shape. Eric, though I suggest to postpone proc and sysctl a bit, can you share me your vision of /proc and /sysctl virtualization a bit? A good way to handle them IMHO is to make fully virtual, i.e. each namespace should have an own set of sysctl or proc tree. > Part of it is that I have stopped to look more closely at what > other people are doing and to look at alternative implementations. If you need any help with it in OpenVZ, feel free to ask. We have broken-out patches for recent 2.6.16 kernel. > One interesting thing I have manged to do is by using ptrace I > have implemented enter for the existing filesystem namespaces > without having to modify the kernel. This at least says > that enter and debugging are two faces of the same coin. Hmmm, strange claim/conclusion... /dev/kmem allows to change namespaces also :) and even to obtain root priviliges if needed... :) >> Is there anything we could merge that we _all_ don't like? I'm pretty >> convinced that no single solution will support Eric's, OpenVZ's, and >> VServer's _existing_ usage models. Somebody is going to have to bend, >> or nothing will ever get merged. Any volunteers? ;) > > I don't think that is the case on the fundamentals. I think with pids > I am an inch away from implementing a pid namespace that is both > recursive, efficient, and can map all of the pids into another pid > space if that is desirable. Plus I can merge most of it incrementally > in the existing kernel, before I even allow for multiple pid spaces. Eric, let's not compare approaches with inches :) As you remember your PID namespaces doesn't suite us well... :( Thanks, Kirill ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-24 15:37 ` Kirill Korotaev @ 2006-03-24 20:28 ` Eric W. Biederman 2006-03-24 21:01 ` Herbert Poetzl 0 siblings, 1 reply; 29+ messages in thread From: Eric W. Biederman @ 2006-03-24 20:28 UTC (permalink / raw) To: Kirill Korotaev Cc: Dave Hansen, Sam Vilain, linux-kernel, Herbert Poetzl, OpenVZ developers list, Serge E.Hallyn, Andrew Morton Kirill Korotaev <dev@sw.ru> writes: >> I certainly have not. I do feel that developing this just from the >> top down is the wrong way to do this. In some of the preliminary >> patches we have found several pieces of code that we will have to >> touch that is currently in need of a cleanup. That is why I have >> been cleaning up /proc. sysctl is in need of similar treatment >> but is in less bad shape. > Eric, though I suggest to postpone proc and sysctl a bit, can you share > me your vision of /proc and /sysctl virtualization a bit? > A good way to handle them IMHO is to make fully virtual, i.e. each > namespace should have an own set of sysctl or proc tree. Roughly I agree. Some cases are easier than others. So let me take just the sysvipc case as an example. My thinking is move the calls for printing the sysvipc namespace from fs/proc/generic.c (with all of it's cool helpers) to fs/proc/base.c. So we wind up with: /proc/<pid>/sysvipc/msg /proc/<pid>/sysvipc/sem /proc/<pid>/sysvipc/shm /proc/sysvipc -> /proc/self/sysvipc For sysctl we add a method to fetch the address of the variable and perhaps a few other attributes, that method is passed a task structure. Then we can have per process instances of: /proc/<pid>/sys/sem /proc/<pid>/sys/shmall /proc/<pid>/sys/shmmax /proc/<pid>/sys/msgmax /proc/<pid>/sys/msgmni /proc/<pid>/sys/shmmni And a symlink at: /proc/sys that points to /proc/<pid>/sys Getting sysvipc to show up in a per process fashion is pretty easy. Getting the entire sys hierarchy to show up per process is a little harder simply because I think to do it cleanly requires help functions that I don't have yet. I have removed all of the internal dependence on magic inode numbers completely removing the hard coded inode numbers and putting sys looks doable. Does that sound like a reasonable model? >> Part of it is that I have stopped to look more closely at what >> other people are doing and to look at alternative implementations. > If you need any help with it in OpenVZ, feel free to ask. We have > broken-out patches for recent 2.6.16 kernel. >> One interesting thing I have manged to do is by using ptrace I >> have implemented enter for the existing filesystem namespaces without having >> to modify the kernel. This at least says >> that enter and debugging are two faces of the same coin. > Hmmm, strange claim/conclusion... /dev/kmem allows to change namespaces > also :) and even to obtain root priviliges if needed... :) True. However this is much less ugly then using /dev/kmem, and it is much closer to what applications like user mode linux do. The primary question in my mind was what should the permissions checks be when performing this kind of action. Using ptrace satisfied that. So I now have a bounding box for what enter should be able to do and what permissions it should take. > Eric, let's not compare approaches with inches :) > As you remember your PID namespaces doesn't suite us well... :( More discussion when the time is right. But I believe I have solved the fundamental incompatibility that we had. I asked you a question to confirm that a while ago, but I have not heard anything back. Eric ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-24 20:28 ` Eric W. Biederman @ 2006-03-24 21:01 ` Herbert Poetzl 2006-03-24 21:13 ` Eric W. Biederman 0 siblings, 1 reply; 29+ messages in thread From: Herbert Poetzl @ 2006-03-24 21:01 UTC (permalink / raw) To: Eric W. Biederman Cc: Kirill Korotaev, Dave Hansen, Sam Vilain, linux-kernel, OpenVZ developers list, Serge E.Hallyn, Andrew Morton On Fri, Mar 24, 2006 at 01:28:49PM -0700, Eric W. Biederman wrote: > Kirill Korotaev <dev@sw.ru> writes: > > > >> I certainly have not. I do feel that developing this just from the > >> top down is the wrong way to do this. In some of the preliminary > >> patches we have found several pieces of code that we will have to > >> touch that is currently in need of a cleanup. That is why I have > >> been cleaning up /proc. sysctl is in need of similar treatment > >> but is in less bad shape. > > Eric, though I suggest to postpone proc and sysctl a bit, can you share > > me your vision of /proc and /sysctl virtualization a bit? > > A good way to handle them IMHO is to make fully virtual, i.e. each > > namespace should have an own set of sysctl or proc tree. > > Roughly I agree. Some cases are easier than others. So let me take > just the sysvipc case as an example. > > My thinking is move the calls for printing the sysvipc namespace > from fs/proc/generic.c (with all of it's cool helpers) to > fs/proc/base.c. > > So we wind up with: > /proc/<pid>/sysvipc/msg > /proc/<pid>/sysvipc/sem > /proc/<pid>/sysvipc/shm > /proc/sysvipc -> /proc/self/sysvipc > > For sysctl we add a method to fetch the address of > the variable and perhaps a few other attributes, > that method is passed a task structure. > > Then we can have per process instances of: > /proc/<pid>/sys/sem > /proc/<pid>/sys/shmall > /proc/<pid>/sys/shmmax > /proc/<pid>/sys/msgmax > /proc/<pid>/sys/msgmni > /proc/<pid>/sys/shmmni > And a symlink at: > /proc/sys that points to /proc/<pid>/sys > > Getting sysvipc to show up in a per process fashion is pretty > easy. Getting the entire sys hierarchy to show up per process > is a little harder simply because I think to do it cleanly requires > help functions that I don't have yet. I have removed all of > the internal dependence on magic inode numbers completely removing > the hard coded inode numbers and putting sys looks doable. > > Does that sound like a reasonable model? hmm, isn't per process a little extreme ... I know what you want to accomplish but won't this lead to a per process procfs? and, if you want to do per process procfs, what would be the gain? just my opinion ... best, Herbert > >> Part of it is that I have stopped to look more closely at what > >> other people are doing and to look at alternative implementations. > > If you need any help with it in OpenVZ, feel free to ask. We have > > broken-out patches for recent 2.6.16 kernel. > > > >> One interesting thing I have manged to do is by using ptrace I have > >> implemented enter for the existing filesystem namespaces without > >> having to modify the kernel. This at least says that enter and > >> debugging are two faces of the same coin. > > Hmmm, strange claim/conclusion... /dev/kmem allows to change namespaces > > also :) and even to obtain root priviliges if needed... :) > > True. However this is much less ugly then using /dev/kmem, and it is > much closer to what applications like user mode linux do. The primary > question in my mind was what should the permissions checks be when > performing this kind of action. Using ptrace satisfied that. > > So I now have a bounding box for what enter should be able to do > and what permissions it should take. > > > Eric, let's not compare approaches with inches :) > > As you remember your PID namespaces doesn't suite us well... :( > > More discussion when the time is right. But I believe I have solved > the fundamental incompatibility that we had. I asked you a question to > confirm that a while ago, but I have not heard anything back. > > Eric ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-24 21:01 ` Herbert Poetzl @ 2006-03-24 21:13 ` Eric W. Biederman 2006-03-24 21:40 ` Herbert Poetzl 0 siblings, 1 reply; 29+ messages in thread From: Eric W. Biederman @ 2006-03-24 21:13 UTC (permalink / raw) To: Herbert Poetzl Cc: Kirill Korotaev, Dave Hansen, Sam Vilain, linux-kernel, OpenVZ developers list, Serge E.Hallyn, Andrew Morton Herbert Poetzl <herbert@13thfloor.at> writes: > hmm, isn't per process a little extreme ... I know > what you want to accomplish but won't this lead to > a per process procfs? Where all of the values vary per process possibly, that is they way /proc is supposed to be. /proc/sys is the only case that I think really gets extreme. For things like /proc/sysvipc and /proc/net it really is a natural break, and /proc/mounts already shows that the technique works fine. So I am trying to turn an ugly design choice into feature :) > and, if you want to do per > process procfs, what would be the gain? > > just my opinion ... Under the covers the implementation is per namespace, but it isn't easy to export it that way from procfs. In any event this appears to be a way to implement these things while retaining backwards compatibility, with the current implementation, and it looks like it can be implemented fairly cleanly. Eric ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-24 21:13 ` Eric W. Biederman @ 2006-03-24 21:40 ` Herbert Poetzl 2006-03-24 22:30 ` Eric W. Biederman 0 siblings, 1 reply; 29+ messages in thread From: Herbert Poetzl @ 2006-03-24 21:40 UTC (permalink / raw) To: Eric W. Biederman Cc: Kirill Korotaev, Dave Hansen, Sam Vilain, linux-kernel, OpenVZ developers list, Serge E.Hallyn, Andrew Morton On Fri, Mar 24, 2006 at 02:13:40PM -0700, Eric W. Biederman wrote: > Herbert Poetzl <herbert@13thfloor.at> writes: > > > hmm, isn't per process a little extreme ... I know > > what you want to accomplish but won't this lead to > > a per process procfs? > > Where all of the values vary per process possibly, that > is they way /proc is supposed to be. > > /proc/sys is the only case that I think really gets extreme. For > things like /proc/sysvipc and /proc/net it really is a natural break, > and /proc/mounts already shows that the technique works fine. well, while /proc/mounts is a good example that it 'works' it isn't a good example for proper design, as the entire private namespaces lead to much obfuscation, and having the mounts per process, where they actually should be per namespace, and to hide the fact that there are different namespaces does not help either ... IMHO a much better design would be to have the namespace 'explicit' and link to that one, containig the mounts entry btw, this is something which should still be possible without breaking anything ... > So I am trying to turn an ugly design choice into feature :) hmm, no, you are trying to multipy an ugly design :) > > and, if you want to do per > > process procfs, what would be the gain? > > > > just my opinion ... > > Under the covers the implementation is per namespace, but > it isn't easy to export it that way from procfs. why? /proc/self -> YYY/ /proc/mounts -> self/mounts (so far nothing new) /proc/YYY/namespace -> ../namespace-XXX/ /proc/YYY/mounts -> namespace/mounts (or alternatively) /proc/namespace -> namespace-XXX/ /proc/mounts -> namespace/mounts > In any event this appears to be a way to implement these things while > retaining backwards compatibility, with the current implementation, > and it looks like it can be implemented fairly cleanly. I don't see any differences regarding compatibility when things like namespaces get explicit ... best, Herbert > Eric ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-24 21:40 ` Herbert Poetzl @ 2006-03-24 22:30 ` Eric W. Biederman 2006-03-25 18:37 ` Eric W. Biederman 0 siblings, 1 reply; 29+ messages in thread From: Eric W. Biederman @ 2006-03-24 22:30 UTC (permalink / raw) To: Herbert Poetzl Cc: Kirill Korotaev, Dave Hansen, Sam Vilain, linux-kernel, OpenVZ developers list, Serge E.Hallyn, Andrew Morton Herbert Poetzl <herbert@13thfloor.at> writes: > well, while /proc/mounts is a good example that it 'works' > it isn't a good example for proper design, as the entire > private namespaces lead to much obfuscation, and having > the mounts per process, where they actually should be per > namespace, and to hide the fact that there are different > namespaces does not help either ... > > IMHO a much better design would be to have the namespace > 'explicit' and link to that one, containig the mounts entry > btw, this is something which should still be possible > without breaking anything ... Actually I agree. That should work for everything except sysctl. The tricky bit is going to be sticky a pid on the namespace group. But the patch should be quite simple. >> So I am trying to turn an ugly design choice into feature :) > > hmm, no, you are trying to multipy an ugly design :) Well only a bit :) I'm still trying to turn the fact that weird things wound up in /proc into a feature. > /proc/self -> YYY/ > /proc/mounts -> self/mounts > > (so far nothing new) > > /proc/YYY/namespace -> ../namespace-XXX/ > /proc/YYY/mounts -> namespace/mounts > > (or alternatively) > > /proc/namespace -> namespace-XXX/ > /proc/mounts -> namespace/mounts Yes. Something like that. It will take a little thinking. But something that doesn't go away when a process does. >> In any event this appears to be a way to implement these things while >> retaining backwards compatibility, with the current implementation, >> and it looks like it can be implemented fairly cleanly. > > I don't see any differences regarding compatibility when > things like namespaces get explicit ... Agreed. Eric ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 0/7] Some basic vserver infrastructure 2006-03-24 22:30 ` Eric W. Biederman @ 2006-03-25 18:37 ` Eric W. Biederman 0 siblings, 0 replies; 29+ messages in thread From: Eric W. Biederman @ 2006-03-25 18:37 UTC (permalink / raw) To: Herbert Poetzl Cc: Kirill Korotaev, Dave Hansen, Sam Vilain, linux-kernel, OpenVZ developers list, Serge E.Hallyn, Andrew Morton ebiederm@xmission.com (Eric W. Biederman) writes: > Herbert Poetzl <herbert@13thfloor.at> writes: > >> well, while /proc/mounts is a good example that it 'works' >> it isn't a good example for proper design, as the entire >> private namespaces lead to much obfuscation, and having >> the mounts per process, where they actually should be per >> namespace, and to hide the fact that there are different >> namespaces does not help either ... >> >> IMHO a much better design would be to have the namespace >> 'explicit' and link to that one, containig the mounts entry >> btw, this is something which should still be possible >> without breaking anything ... > > Actually I agree. That should work for everything except sysctl. > > The tricky bit is going to be sticky a pid on the namespace group. > But the patch should be quite simple. Actually the tricky bit is that there is no way to list resources that processes share, except for by looking at the processes. Changing that has performance implications and is at least slightly non-trivial. Given that the primary upside is better debugging I'm not a fan. I would much rather modify the interfaces to have double counts and work like the network devices. Where you get warnings when someone is still using the device after it has been made to go away. I appreciate the concern, and I share it. I just don't think that right now there is a good mechanism to get better visibility. Eric ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <20060321061333.27638.9112.stgit@localhost.localdomain>]
* Re: [RFC] [PATCH 1/7] Add process virtualisation umbrella structure (vx_info) [not found] ` <20060321061333.27638.9112.stgit@localhost.localdomain> @ 2006-03-21 18:53 ` Dave Hansen 2006-03-21 21:52 ` Sam Vilain 2006-03-22 2:02 ` Herbert Poetzl 0 siblings, 2 replies; 29+ messages in thread From: Dave Hansen @ 2006-03-21 18:53 UTC (permalink / raw) To: Sam Vilain Cc: linux-kernel, Herbert Poetzl, Eric W.Biederman, OpenVZ developers list, Serge E.Hallyn, Andrew Morton On Tue, 2006-03-21 at 18:13 +1200, Sam Vilain wrote: > +static inline void release_vx_info(struct vx_info *vxi, > + struct task_struct *task) > +{ > + might_sleep(); > + > + if (atomic_dec_and_test(&vxi->vx_tasks)) > + unhash_vx_info(vxi); > +} Are these better handled by krefs and their destructors? -- Dave ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 1/7] Add process virtualisation umbrella structure (vx_info) 2006-03-21 18:53 ` [RFC] [PATCH 1/7] Add process virtualisation umbrella structure (vx_info) Dave Hansen @ 2006-03-21 21:52 ` Sam Vilain 2006-03-22 2:02 ` Herbert Poetzl 1 sibling, 0 replies; 29+ messages in thread From: Sam Vilain @ 2006-03-21 21:52 UTC (permalink / raw) To: Dave Hansen Cc: linux-kernel, Herbert Poetzl, Eric W.Biederman, OpenVZ developers list, Serge E.Hallyn, Andrew Morton Dave Hansen wrote: >On Tue, 2006-03-21 at 18:13 +1200, Sam Vilain wrote: > > >>+static inline void release_vx_info(struct vx_info *vxi, >>+ struct task_struct *task) >>+{ >>+ might_sleep(); >>+ >>+ if (atomic_dec_and_test(&vxi->vx_tasks)) >>+ unhash_vx_info(vxi); >>+} >> >> > >Are these better handled by krefs and their destructors? > > It does seem a little clumsy, doesn't it. I've found Documentation/kref.txt and will rewrite those functions to use this API. Sam. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC] [PATCH 1/7] Add process virtualisation umbrella structure (vx_info) 2006-03-21 18:53 ` [RFC] [PATCH 1/7] Add process virtualisation umbrella structure (vx_info) Dave Hansen 2006-03-21 21:52 ` Sam Vilain @ 2006-03-22 2:02 ` Herbert Poetzl 1 sibling, 0 replies; 29+ messages in thread From: Herbert Poetzl @ 2006-03-22 2:02 UTC (permalink / raw) To: Dave Hansen Cc: Sam Vilain, linux-kernel, Eric W.Biederman, OpenVZ developers list, Serge E.Hallyn, Andrew Morton On Tue, Mar 21, 2006 at 10:53:05AM -0800, Dave Hansen wrote: > On Tue, 2006-03-21 at 18:13 +1200, Sam Vilain wrote: > > +static inline void release_vx_info(struct vx_info *vxi, > > + struct task_struct *task) > > +{ > > + might_sleep(); > > + > > + if (atomic_dec_and_test(&vxi->vx_tasks)) > > + unhash_vx_info(vxi); > > +} > > Are these better handled by krefs and their destructors? well, those were there long before krefs got into the kernel, IIRC :) best, Herbert > -- Dave ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2006-04-19 21:42 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20060321061333.27638.63963.stgit@localhost.localdomain>
2006-03-21 18:50 ` [RFC] [PATCH 0/7] Some basic vserver infrastructure Dave Hansen
2006-03-21 21:08 ` Sam Vilain
2006-03-21 21:32 ` Dave Hansen
2006-03-21 23:12 ` Sam Vilain
2006-03-22 5:18 ` Sam Vilain
2006-03-22 7:13 ` Eric W. Biederman
2006-03-23 4:17 ` Sam Vilain
2006-03-24 15:36 ` [Devel] " Kirill Korotaev
2006-03-27 12:45 ` Serge E. Hallyn
2006-03-28 5:28 ` Sam Vilain
2006-03-29 12:07 ` Kirill Korotaev
2006-03-29 13:47 ` Serge E. Hallyn
2006-03-29 21:30 ` Sam Vilain
2006-04-19 7:50 ` Eric W. Biederman
2006-04-19 21:42 ` Sam Vilain
2006-03-22 6:41 ` Eric W. Biederman
2006-03-23 4:29 ` Sam Vilain
2006-03-23 4:50 ` Andrew Morton
2006-03-24 15:38 ` Kirill Korotaev
2006-03-24 15:37 ` Kirill Korotaev
2006-03-24 20:28 ` Eric W. Biederman
2006-03-24 21:01 ` Herbert Poetzl
2006-03-24 21:13 ` Eric W. Biederman
2006-03-24 21:40 ` Herbert Poetzl
2006-03-24 22:30 ` Eric W. Biederman
2006-03-25 18:37 ` Eric W. Biederman
[not found] ` <20060321061333.27638.9112.stgit@localhost.localdomain>
2006-03-21 18:53 ` [RFC] [PATCH 1/7] Add process virtualisation umbrella structure (vx_info) Dave Hansen
2006-03-21 21:52 ` Sam Vilain
2006-03-22 2:02 ` Herbert Poetzl
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.