* Re: Kernel 2.6.33-rc6, 3 bugs container specific. [not found] ` <4B68649D.2000503-GANU6spQydw@public.gmane.org> @ 2010-02-02 18:18 ` Serge E. Hallyn [not found] ` <20100202181801.GA28412-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Serge E. Hallyn @ 2010-02-02 18:18 UTC (permalink / raw) To: Daniel Lezcano Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org): > Jean-Marc Pigeon wrote: > > Hello, > > > > On Tue, 2010-02-02 at 04:16 +0100, Michael Holzt wrote: > > > >>> Tried 2.6.33-rc6 to check container, 3 bugs show up. > >>> (test done on x86_64, Pentium(R) Dual-Core CPU E5400) > >>> > >> I guess this should better go on the containers mailing list, > >> as this are kernel related problems? > >> > > > > Yes, you are right....I'll do it. > > > > Problem with /proc/kmsg (bug #3) is a very real > > concern, Daniel Lezcano proposed a solution > > (using fuse), but I think this solution is > > just a patch (container sys-admin can override > > it, putting the whole system in total jeopardy). > > > > Seems kernel team is very reluctant to make > > the K ring buffer virtual but I see no other > > solution (used already in openVZ). > > > > Maybe I missed something, but AFAIR Serge Hallyn was willing to do this, > no ? > Or there was a nack from someone ? I was wondering out loud about the best design to solve his problem. If we try to redirect kernel-generated messages to containers, we have several problems, including whether we need to duplicate the messages to the host container. So in one sense it seems more flexible to 1. send everything to host syslog 2. clamp down on syslog use by processes not in the init_user_ns 3. let the userspace on the host copy messages into a socket or file so child container can pretend it has real syslog. -serge ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20100202181801.GA28412-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: Kernel 2.6.33-rc6, 3 bugs container specific. [not found] ` <20100202181801.GA28412-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2010-02-02 18:43 ` Jean-Marc Pigeon [not found] ` <1265136215.6260.261.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Jean-Marc Pigeon @ 2010-02-02 18:43 UTC (permalink / raw) To: Serge E. Hallyn Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Hello, > > I was wondering out loud about the best design to solve his problem. > > If we try to redirect kernel-generated messages to containers, we have > several problems, including whether we need to duplicate the messages > to the host container. So in one sense it seems more flexible to > 1. send everything to host syslog No, if we do that all CONTs message will reach the same bucket and it will be difficult to sort them out.. CONT sys_admin and HOST sys_admin could be different "entity", so you debug CONT config and critical needed information reach HOST (which you do not have access to). > 2. clamp down on syslog use by processes not in the init_user_ns Could give me more detail??... > 3. let the userspace on the host copy messages into a socket or > file so child container can pretend it has real syslog. So you trap printk message from CONT on the HOST and redirect them on CONT but on a standard syslog channel. Seem OK to me, as long /proc/kmsg is not existing (/dev/null) in the CONT file tree. -- A bientôt ========================================================================== Jean-Marc Pigeon Internet: jmp@safe.ca SAFE Inc. Phone: (514) 493-4280 Fax: (514) 493-1946 Clement, 'a kiss solution' to get rid of SPAM (at last) Clement' Home base <"http://www.clement.safe.ca"> ========================================================================== ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <1265136215.6260.261.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org>]
* Re: Kernel 2.6.33-rc6, 3 bugs container specific. [not found] ` <1265136215.6260.261.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org> @ 2010-02-02 21:32 ` Serge E. Hallyn [not found] ` <20100202213254.GH32305-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Serge E. Hallyn @ 2010-02-02 21:32 UTC (permalink / raw) To: Jean-Marc Pigeon Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Quoting Jean-Marc Pigeon (jmp-4qkeo2rQ0gg@public.gmane.org): > Hello, > > > > > > I was wondering out loud about the best design to solve his problem. > > > > If we try to redirect kernel-generated messages to containers, we have > > several problems, including whether we need to duplicate the messages > > to the host container. So in one sense it seems more flexible to > > 1. send everything to host syslog > No, if we do that all CONTs message will reach > the same bucket and it will be difficult to sort > them out.. > CONT sys_admin and HOST sys_admin could be different > "entity", so you debug CONT config and critical > needed information reach HOST (which you do not > have access to). Yes, so a privileged task on HOST must pass that information back to you on CONT. That is not a valid complaint imo. But how to sort the msgs out is a valid question. We need some sort of identifier, unique system-wide, attached to.. something. Is ifindex unique system-wide right now? Oh, IIRC it is, but we wnat it to be containerized, so that would be a bad choice :) > > 2. clamp down on syslog use by processes not in the init_user_ns > Could give me more detail??... Simplest choices would be to just refuse sys_syslog() and open(/proc/kmsg) altogether from a container, or to only allow reading/writing messages to own syslog. (I had hoped to find time to try out the second option but simply haven't had the time, and it doesn't look like I will very soon. So if anyone else wants to, pls jump at it...) Then /proc/kmsg can provide what I described above through a FUSE file, and if, as you mentioned, the container unmounts the FUSE fs and gets to real procfs, they just get nothing. > > 3. let the userspace on the host copy messages into a socket or > > file so child container can pretend it has real syslog. > > So you trap printk message from CONT on the HOST and > redirect them on CONT but on a standard syslog channel. > Seem OK to me, as long /proc/kmsg is not existing > (/dev/null) in the CONT file tree. > > -- > A bientôt > ========================================================================== > Jean-Marc Pigeon Internet: jmp-4qkeo2rQ0gg@public.gmane.org > SAFE Inc. Phone: (514) 493-4280 > Fax: (514) 493-1946 > Clement, 'a kiss solution' to get rid of SPAM (at last) > Clement' Home base <"http://www.clement.safe.ca"> > ========================================================================== ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20100202213254.GH32305-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: Kernel 2.6.33-rc6, 3 bugs container specific. [not found] ` <20100202213254.GH32305-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2010-02-03 10:51 ` Daniel Lezcano [not found] ` <4B695535.7020301-GANU6spQydw@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Daniel Lezcano @ 2010-02-03 10:51 UTC (permalink / raw) To: Serge E. Hallyn Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Serge E. Hallyn wrote: > Quoting Jean-Marc Pigeon (jmp-4qkeo2rQ0gg@public.gmane.org): >> Hello, >> >> >>> I was wondering out loud about the best design to solve his problem. >>> >>> If we try to redirect kernel-generated messages to containers, we have >>> several problems, including whether we need to duplicate the messages >>> to the host container. So in one sense it seems more flexible to >>> 1. send everything to host syslog >> No, if we do that all CONTs message will reach >> the same bucket and it will be difficult to sort >> them out.. >> CONT sys_admin and HOST sys_admin could be different >> "entity", so you debug CONT config and critical >> needed information reach HOST (which you do not >> have access to). > > Yes, so a privileged task on HOST must pass that information back to > you on CONT. That is not a valid complaint imo. But how to sort the > msgs out is a valid question. > > We need some sort of identifier, unique system-wide, attached to.. something. > Is ifindex unique system-wide right now? Oh, IIRC it is, but we wnat it to > be containerized, so that would be a bad choice :) > >>> 2. clamp down on syslog use by processes not in the init_user_ns >> Could give me more detail??... > > Simplest choices would be to just refuse sys_syslog() and open(/proc/kmsg) > altogether from a container, or to only allow reading/writing messages > to own syslog. (I had hoped to find time to try out the second option but > simply haven't had the time, and it doesn't look like I will very soon. > So if anyone else wants to, pls jump at it...) > > Then /proc/kmsg can provide what I described above through a FUSE file, > and if, as you mentioned, the container unmounts the FUSE fs and gets > to real procfs, they just get nothing. > >>> 3. let the userspace on the host copy messages into a socket or >>> file so child container can pretend it has real syslog. >> So you trap printk message from CONT on the HOST and >> redirect them on CONT but on a standard syslog channel. >> Seem OK to me, as long /proc/kmsg is not existing >> (/dev/null) in the CONT file tree. We have: * Commands to sys_syslog: * * 0 -- Close the log. Currently a NOP. * 1 -- Open the log. Currently a NOP. * 2 -- Read from the log. * 3 -- Read all messages remaining in the ring buffer. * 4 -- Read and clear all messages remaining in the ring buffer * 5 -- Clear ring buffer. * 6 -- Disable printk to console * 7 -- Enable printk to console * 8 -- Set level of messages printed to console * 9 -- Return number of unread characters in the log buffer * 10 -- Return size of the log buffer And add: * 11 -- create a new ring buffer for the current process and its childs We have, let's say a global ring buffer keep untouched, used by syslog(2) and printk. When we create a new ring buffer, we allocate it and assign to the nsproxy (global ring buffer is the default in the nsproxy). The prink keeps writing in the global ring buffer and the syslog(2) writes to the "namespaced" ring buffer. Does it makes sense ? ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <4B695535.7020301-GANU6spQydw@public.gmane.org>]
* Re: Kernel 2.6.33-rc6, 3 bugs container specific. [not found] ` <4B695535.7020301-GANU6spQydw@public.gmane.org> @ 2010-02-03 13:24 ` Jean-Marc Pigeon 2010-02-03 15:03 ` Serge E. Hallyn 1 sibling, 0 replies; 12+ messages in thread From: Jean-Marc Pigeon @ 2010-02-03 13:24 UTC (permalink / raw) To: Daniel Lezcano Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Hello [..] > > > We have: > * Commands to sys_syslog: > * > * 0 -- Close the log. Currently a NOP. > * 1 -- Open the log. Currently a NOP. > * 2 -- Read from the log. > * 3 -- Read all messages remaining in the ring buffer. > * 4 -- Read and clear all messages remaining in the ring buffer > * 5 -- Clear ring buffer. > * 6 -- Disable printk to console > * 7 -- Enable printk to console > * 8 -- Set level of messages printed to console > * 9 -- Return number of unread characters in the log buffer > * 10 -- Return size of the log buffer > > And add: > * 11 -- create a new ring buffer for the current process and > its childs > > > We have, let's say a global ring buffer keep untouched, used by > syslog(2) and printk. When we create a new ring buffer, we allocate it > and assign to the nsproxy (global ring buffer is the default in the > nsproxy). > > The prink keeps writing in the global ring buffer and the syslog(2) > writes to the "namespaced" ring buffer. > > Does it makes sense ? I like this idea, as it give us flexibility. Caution: iptable packet log is using printk (AFIK) to do tracing. iptable can be used within CONT: above the iptable within HOST: (ip filtering superset). So CONT:printk must be writing too in the "namespaced" ring buffer. -- A bientôt ========================================================================== Jean-Marc Pigeon Internet: jmp@safe.ca SAFE Inc. Phone: (514) 493-4280 Fax: (514) 493-1946 Clement, 'a kiss solution' to get rid of SPAM (at last) Clement' Home base <"http://www.clement.safe.ca"> ========================================================================== ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Kernel 2.6.33-rc6, 3 bugs container specific. [not found] ` <4B695535.7020301-GANU6spQydw@public.gmane.org> 2010-02-03 13:24 ` Jean-Marc Pigeon @ 2010-02-03 15:03 ` Serge E. Hallyn [not found] ` <20100203150350.GA7146-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 12+ messages in thread From: Serge E. Hallyn @ 2010-02-03 15:03 UTC (permalink / raw) To: Daniel Lezcano Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org): > Serge E. Hallyn wrote: > >Quoting Jean-Marc Pigeon (jmp-4qkeo2rQ0gg@public.gmane.org): > >>Hello, > >> > >> > >>>I was wondering out loud about the best design to solve his problem. > >>> > >>>If we try to redirect kernel-generated messages to containers, we have > >>>several problems, including whether we need to duplicate the messages > >>>to the host container. So in one sense it seems more flexible to > >>> 1. send everything to host syslog > >> No, if we do that all CONTs message will reach > >> the same bucket and it will be difficult to sort > >> them out.. > >> CONT sys_admin and HOST sys_admin could be different > >> "entity", so you debug CONT config and critical > >> needed information reach HOST (which you do not have access > >>to). > > > >Yes, so a privileged task on HOST must pass that information back to > >you on CONT. That is not a valid complaint imo. But how to sort the > >msgs out is a valid question. > > > >We need some sort of identifier, unique system-wide, attached to.. something. > >Is ifindex unique system-wide right now? Oh, IIRC it is, but we wnat it to > >be containerized, so that would be a bad choice :) > > > >>> 2. clamp down on syslog use by processes not in the init_user_ns > >> Could give me more detail??... > > > >Simplest choices would be to just refuse sys_syslog() and open(/proc/kmsg) > >altogether from a container, or to only allow reading/writing messages > >to own syslog. (I had hoped to find time to try out the second option but > >simply haven't had the time, and it doesn't look like I will very soon. > >So if anyone else wants to, pls jump at it...) > > > >Then /proc/kmsg can provide what I described above through a FUSE file, > >and if, as you mentioned, the container unmounts the FUSE fs and gets > >to real procfs, they just get nothing. > > > >>> 3. let the userspace on the host copy messages into a socket or > >>> file so child container can pretend it has real syslog. > >> So you trap printk message from CONT on the HOST and > >> redirect them on CONT but on a standard syslog channel. > >> Seem OK to me, as long /proc/kmsg is not existing > >> (/dev/null) in the CONT file tree. > > > We have: > * Commands to sys_syslog: > * > * 0 -- Close the log. Currently a NOP. > * 1 -- Open the log. Currently a NOP. > * 2 -- Read from the log. > * 3 -- Read all messages remaining in the ring buffer. > * 4 -- Read and clear all messages remaining in the ring buffer > * 5 -- Clear ring buffer. > * 6 -- Disable printk to console > * 7 -- Enable printk to console > * 8 -- Set level of messages printed to console > * 9 -- Return number of unread characters in the log buffer > * 10 -- Return size of the log buffer > > And add: > * 11 -- create a new ring buffer for the current process > and its childs > > > We have, let's say a global ring buffer keep untouched, used by > syslog(2) and printk. When we create a new ring buffer, we allocate > it and assign to the nsproxy (global ring buffer is the default in > the nsproxy). > > The prink keeps writing in the global ring buffer and the syslog(2) > writes to the "namespaced" ring buffer. > > Does it makes sense ? Yeah, it's a nice alternative. Though (1) there is something to be said for forcing a new ring buffer upon clone(CLONE_NEWUSER), and (2) assuming the new ring buffer is pointed to from nsproxy, it might be frowned upon to do an unshare/clone action in yet another way. I still think our first concern should be safety, and that we should consider just adding 'struct syslog_struct' to nsproxy, and making that NULL on a clone(CLONE_NEWUSER). any sys_syslog() or /proc/kmsg access returns -EINVAL after that. Then we can discuss whether and how to target printks to namespaces, and whether duplicates should be sent to parent namespaces. After we start getting flexible with syslog, the next request will be for audit flexibility. I don't even know how our netlink support suffices for that right now. (So, this all does turn into a big deal...) -serge ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20100203150350.GA7146-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: Kernel 2.6.33-rc6, 3 bugs container specific. [not found] ` <20100203150350.GA7146-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2010-02-03 15:48 ` Jean-Marc Pigeon [not found] ` <1265212090.6260.284.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org> 2010-02-04 9:33 ` Daniel Lezcano 1 sibling, 1 reply; 12+ messages in thread From: Jean-Marc Pigeon @ 2010-02-03 15:48 UTC (permalink / raw) To: Serge E. Hallyn Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Hello, [...] > > > > The prink keeps writing in the global ring buffer and the syslog(2) > > writes to the "namespaced" ring buffer. > > > > Does it makes sense ? > > Yeah, it's a nice alternative. Though (1) there is something to be said for > forcing a new ring buffer upon clone(CLONE_NEWUSER), and (2) assuming the > new ring buffer is pointed to from nsproxy, it might be frowned upon to do > an unshare/clone action in yet another way. > > I still think our first concern should be safety, and that we should consider > just adding 'struct syslog_struct' to nsproxy, and making that NULL on a > clone(CLONE_NEWUSER). any sys_syslog() or /proc/kmsg access returns -EINVAL > after that. Then we can discuss whether and how to target printks to > namespaces, and whether duplicates should be sent to parent namespaces. /proc/kmsg=-EINVAL will resolve the own HOST: ring buffer corruption not sure what sys_syslog()=-EINVAL mean???, rsyslog MUST be able to run within CONT: right? printk namespaces duplicate and sent to parent namespace is not a good idea (duplicating&forwarding is done by tools as rsyslogd). > > After we start getting flexible with syslog, the next request will be for > audit flexibility. I don't even know how our netlink support suffices for > that right now. > > (So, this all does turn into a big deal...) > > -serge -- A bientôt ========================================================================== Jean-Marc Pigeon Internet: jmp@safe.ca SAFE Inc. Phone: (514) 493-4280 Fax: (514) 493-1946 Clement, 'a kiss solution' to get rid of SPAM (at last) Clement' Home base <"http://www.clement.safe.ca"> ========================================================================== ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <1265212090.6260.284.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org>]
* Re: Kernel 2.6.33-rc6, 3 bugs container specific. [not found] ` <1265212090.6260.284.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org> @ 2010-02-03 16:21 ` Serge E. Hallyn 0 siblings, 0 replies; 12+ messages in thread From: Serge E. Hallyn @ 2010-02-03 16:21 UTC (permalink / raw) To: Jean-Marc Pigeon Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Quoting Jean-Marc Pigeon (jmp-4qkeo2rQ0gg@public.gmane.org): > Hello, > > [...] > > > > > > The prink keeps writing in the global ring buffer and the syslog(2) > > > writes to the "namespaced" ring buffer. > > > > > > Does it makes sense ? > > > > Yeah, it's a nice alternative. Though (1) there is something to be said for > > forcing a new ring buffer upon clone(CLONE_NEWUSER), and (2) assuming the > > new ring buffer is pointed to from nsproxy, it might be frowned upon to do > > an unshare/clone action in yet another way. > > > > I still think our first concern should be safety, and that we should consider > > just adding 'struct syslog_struct' to nsproxy, and making that NULL on a > > clone(CLONE_NEWUSER). any sys_syslog() or /proc/kmsg access returns -EINVAL > > after that. Then we can discuss whether and how to target printks to > > namespaces, and whether duplicates should be sent to parent namespaces. > /proc/kmsg=-EINVAL will resolve the own HOST: ring buffer corruption > not sure what sys_syslog()=-EINVAL mean???, rsyslog MUST be able to > run within CONT: right? > > printk namespaces duplicate and sent to parent namespace > is not a good idea (duplicating&forwarding is done by tools as rsyslogd). Heh, agreed, I just disagree that we should trust CONT to fwd stuff to HOST, rather we should have HOST fwd stuff to CONT. It comes down to whether HOST might need the info to determine what CONT or someone attacking CONT is up to. To the extend that HOST will never be fully safe from CONT (and if you think it is, even using kvm/vmware, then you you're deluding yourself) I think it's misguided to keep information from HOST. > > After we start getting flexible with syslog, the next request will be for > > audit flexibility. I don't even know how our netlink support suffices for > > that right now. > > > > (So, this all does turn into a big deal...) > > > > -serge > -- > A bientôt > ========================================================================== > Jean-Marc Pigeon Internet: jmp-4qkeo2rQ0gg@public.gmane.org > SAFE Inc. Phone: (514) 493-4280 > Fax: (514) 493-1946 > Clement, 'a kiss solution' to get rid of SPAM (at last) > Clement' Home base <"http://www.clement.safe.ca"> > ========================================================================== ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Kernel 2.6.33-rc6, 3 bugs container specific. [not found] ` <20100203150350.GA7146-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2010-02-03 15:48 ` Jean-Marc Pigeon @ 2010-02-04 9:33 ` Daniel Lezcano [not found] ` <4B6A9461.1010309-GANU6spQydw@public.gmane.org> 1 sibling, 1 reply; 12+ messages in thread From: Daniel Lezcano @ 2010-02-04 9:33 UTC (permalink / raw) To: Serge E. Hallyn Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Serge E. Hallyn wrote: > Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org): > >> Serge E. Hallyn wrote: >> >>> Quoting Jean-Marc Pigeon (jmp-4qkeo2rQ0gg@public.gmane.org): >>> >>>> Hello, >>>> >>>> >>>> >>>>> I was wondering out loud about the best design to solve his problem. >>>>> >>>>> If we try to redirect kernel-generated messages to containers, we have >>>>> several problems, including whether we need to duplicate the messages >>>>> to the host container. So in one sense it seems more flexible to >>>>> 1. send everything to host syslog >>>>> >>>> No, if we do that all CONTs message will reach >>>> the same bucket and it will be difficult to sort >>>> them out.. >>>> CONT sys_admin and HOST sys_admin could be different >>>> "entity", so you debug CONT config and critical >>>> needed information reach HOST (which you do not have access >>>> to). >>>> >>> Yes, so a privileged task on HOST must pass that information back to >>> you on CONT. That is not a valid complaint imo. But how to sort the >>> msgs out is a valid question. >>> >>> We need some sort of identifier, unique system-wide, attached to.. something. >>> Is ifindex unique system-wide right now? Oh, IIRC it is, but we wnat it to >>> be containerized, so that would be a bad choice :) >>> >>> >>>>> 2. clamp down on syslog use by processes not in the init_user_ns >>>>> >>>> Could give me more detail??... >>>> >>> Simplest choices would be to just refuse sys_syslog() and open(/proc/kmsg) >>> altogether from a container, or to only allow reading/writing messages >>> to own syslog. (I had hoped to find time to try out the second option but >>> simply haven't had the time, and it doesn't look like I will very soon. >>> So if anyone else wants to, pls jump at it...) >>> >>> Then /proc/kmsg can provide what I described above through a FUSE file, >>> and if, as you mentioned, the container unmounts the FUSE fs and gets >>> to real procfs, they just get nothing. >>> >>> >>>>> 3. let the userspace on the host copy messages into a socket or >>>>> file so child container can pretend it has real syslog. >>>>> >>>> So you trap printk message from CONT on the HOST and >>>> redirect them on CONT but on a standard syslog channel. >>>> Seem OK to me, as long /proc/kmsg is not existing >>>> (/dev/null) in the CONT file tree. >>>> >> We have: >> * Commands to sys_syslog: >> * >> * 0 -- Close the log. Currently a NOP. >> * 1 -- Open the log. Currently a NOP. >> * 2 -- Read from the log. >> * 3 -- Read all messages remaining in the ring buffer. >> * 4 -- Read and clear all messages remaining in the ring buffer >> * 5 -- Clear ring buffer. >> * 6 -- Disable printk to console >> * 7 -- Enable printk to console >> * 8 -- Set level of messages printed to console >> * 9 -- Return number of unread characters in the log buffer >> * 10 -- Return size of the log buffer >> >> And add: >> * 11 -- create a new ring buffer for the current process >> and its childs >> >> >> We have, let's say a global ring buffer keep untouched, used by >> syslog(2) and printk. When we create a new ring buffer, we allocate >> it and assign to the nsproxy (global ring buffer is the default in >> the nsproxy). >> >> The prink keeps writing in the global ring buffer and the syslog(2) >> writes to the "namespaced" ring buffer. >> >> Does it makes sense ? >> > > Yeah, it's a nice alternative. Though (1) there is something to be said for > forcing a new ring buffer upon clone(CLONE_NEWUSER), and (2) assuming the > new ring buffer is pointed to from nsproxy, it might be frowned upon to do > an unshare/clone action in yet another way. > Why do you want to tie clone(CLONE_NEWUSER) with a new ring buffer ? I mean one may want to use CLONE_NEWUSER but keep the ring buffer, no ? > I still think our first concern should be safety, and that we should consider > just adding 'struct syslog_struct' to nsproxy, and making that NULL on a > clone(CLONE_NEWUSER). any sys_syslog() or /proc/kmsg access returns -EINVAL > after that. Then we can discuss whether and how to target printks to > namespaces, and whether duplicates should be sent to parent namespaces. > That makes sense to do it step by step. Targeting the printk is the more difficult, no ? I mean you should have always the destination namespace available which is not obvious when the printk is called from an interrupt context. > After we start getting flexible with syslog, the next request will be for > audit flexibility. I don't even know how our netlink support suffices for > that right now. > > (So, this all does turn into a big deal...) > Mmh ... right. ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <4B6A9461.1010309-GANU6spQydw@public.gmane.org>]
* Re: [Lxc-users] Kernel 2.6.33-rc6, 3 bugs container specific. [not found] ` <4B6A9461.1010309-GANU6spQydw@public.gmane.org> @ 2010-02-04 15:19 ` Serge E. Hallyn [not found] ` <20100204151927.GA7556-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Serge E. Hallyn @ 2010-02-04 15:19 UTC (permalink / raw) To: Daniel Lezcano Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org): > Serge E. Hallyn wrote: > >Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org): > >>Serge E. Hallyn wrote: > >>>Quoting Jean-Marc Pigeon (jmp-4qkeo2rQ0gg@public.gmane.org): > >>>>Hello, > >>>> > >>>> > >>>>>I was wondering out loud about the best design to solve his problem. > >>>>> > >>>>>If we try to redirect kernel-generated messages to containers, we have > >>>>>several problems, including whether we need to duplicate the messages > >>>>>to the host container. So in one sense it seems more flexible to > >>>>> 1. send everything to host syslog > >>>> No, if we do that all CONTs message will reach > >>>> the same bucket and it will be difficult to sort > >>>> them out.. > >>>> CONT sys_admin and HOST sys_admin could be different > >>>> "entity", so you debug CONT config and critical > >>>> needed information reach HOST (which you do not have access > >>>>to). > >>>Yes, so a privileged task on HOST must pass that information back to > >>>you on CONT. That is not a valid complaint imo. But how to sort the > >>>msgs out is a valid question. > >>> > >>>We need some sort of identifier, unique system-wide, attached to.. something. > >>>Is ifindex unique system-wide right now? Oh, IIRC it is, but we wnat it to > >>>be containerized, so that would be a bad choice :) > >>> > >>>>> 2. clamp down on syslog use by processes not in the init_user_ns > >>>> Could give me more detail??... > >>>Simplest choices would be to just refuse sys_syslog() and open(/proc/kmsg) > >>>altogether from a container, or to only allow reading/writing messages > >>>to own syslog. (I had hoped to find time to try out the second option but > >>>simply haven't had the time, and it doesn't look like I will very soon. > >>>So if anyone else wants to, pls jump at it...) > >>> > >>>Then /proc/kmsg can provide what I described above through a FUSE file, > >>>and if, as you mentioned, the container unmounts the FUSE fs and gets > >>>to real procfs, they just get nothing. > >>> > >>>>> 3. let the userspace on the host copy messages into a socket or > >>>>> file so child container can pretend it has real syslog. > >>>> So you trap printk message from CONT on the HOST and > >>>> redirect them on CONT but on a standard syslog channel. > >>>> Seem OK to me, as long /proc/kmsg is not existing > >>>> (/dev/null) in the CONT file tree. > >>We have: > >> * Commands to sys_syslog: > >> * > >> * 0 -- Close the log. Currently a NOP. > >> * 1 -- Open the log. Currently a NOP. > >> * 2 -- Read from the log. > >> * 3 -- Read all messages remaining in the ring buffer. > >> * 4 -- Read and clear all messages remaining in the ring buffer > >> * 5 -- Clear ring buffer. > >> * 6 -- Disable printk to console > >> * 7 -- Enable printk to console > >> * 8 -- Set level of messages printed to console > >> * 9 -- Return number of unread characters in the log buffer > >> * 10 -- Return size of the log buffer > >> > >>And add: > >> * 11 -- create a new ring buffer for the current process > >>and its childs > >> > >> > >>We have, let's say a global ring buffer keep untouched, used by > >>syslog(2) and printk. When we create a new ring buffer, we allocate > >>it and assign to the nsproxy (global ring buffer is the default in > >>the nsproxy). > >> > >>The prink keeps writing in the global ring buffer and the syslog(2) > >>writes to the "namespaced" ring buffer. > >> > >>Does it makes sense ? > > > >Yeah, it's a nice alternative. Though (1) there is something to be said for > >forcing a new ring buffer upon clone(CLONE_NEWUSER), and (2) assuming the > >new ring buffer is pointed to from nsproxy, it might be frowned upon to do > >an unshare/clone action in yet another way. > Why do you want to tie clone(CLONE_NEWUSER) with a new ring buffer ? > I mean one may want to use CLONE_NEWUSER but keep the ring buffer, no ? Hmm, well yesterday I was thinking no, but I guess you're right. I may be wanting to remap userids and not contain root. I still like your syslog command 11, but assuming we want to keep the syslog_ns on nsproxy, I think we really need to stick to clone/unshare. So if we want to add a CLONE_SYSLOG flag, we have to wait until eclone gets us more clone flags :) Or, pull out the eclone patchset from linux-cr and make it prereq for this. > >I still think our first concern should be safety, and that we should consider > >just adding 'struct syslog_struct' to nsproxy, and making that NULL on a > >clone(CLONE_NEWUSER). any sys_syslog() or /proc/kmsg access returns -EINVAL > >after that. Then we can discuss whether and how to target printks to > >namespaces, and whether duplicates should be sent to parent namespaces. > That makes sense to do it step by step. Targeting the printk is the > more difficult, no ? I mean you should have always the destination > namespace available which is not obvious when the printk is called > from an interrupt context. > > >After we start getting flexible with syslog, the next request will be for > >audit flexibility. I don't even know how our netlink support suffices for > >that right now. > > > >(So, this all does turn into a big deal...) > Mmh ... right. ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20100204151927.GA7556-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: Kernel 2.6.33-rc6, 3 bugs container specific. [not found] ` <20100204151927.GA7556-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2010-02-04 16:02 ` Cedric Le Goater 0 siblings, 0 replies; 12+ messages in thread From: Cedric Le Goater @ 2010-02-04 16:02 UTC (permalink / raw) To: Serge E. Hallyn Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On 02/04/2010 04:19 PM, Serge E. Hallyn wrote: > I still like your syslog command 11, but assuming we want to keep the > syslog_ns on nsproxy, I think we really need to stick to clone/unshare. Yes. let's keep the namespace creation API consistent. this is complex enough. C. ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Kernel 2.6.33-rc6, 3 bugs container specific.
@ 2010-02-02 14:46 Jean-Marc Pigeon
0 siblings, 0 replies; 12+ messages in thread
From: Jean-Marc Pigeon @ 2010-02-02 14:46 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Hello,
Tried 2.6.33-rc6 to check container, 3 bugs show up.
(test done on x86_64, Pentium(R) Dual-Core CPU E5400)
#1: Critical / fixed?:
Already reported: system hang very badly if you start
a container (clone) while cloneflag is set with at least
one of the set:
CLONE_NEWNET|CLONE_NEWIPC|CLONE_NEWNS|CLONE_NEWPID|CLONE_NEWUTS.
Bug is said fixed:
(commit fabf318e5e4bda0aca2b0d617b191884fda62703),
and is somewhere in queue, hopefully will be part of rc7.
#2: Trouble / can be override by sys_admin
arping not working if HOST interface not named
the same as in CONT.
Lets say you set the HOST "eth0" interface to be
"fast" to met whatever your standard are and
rename CONT veth to be eth0 using command:
ip link set vth_name name eth0
(within CONT) to allow very standard CONT template.
directory HOST:/sys/class/net will report
br0 fast lo sit0 'to-vth'
directory CONT:/sys/class/net will report
exactly the same
Problem: file
/etc/sysconfig/network-scripts/ifup-eth
is doing "ip link set dev eth0 up" as
eth0 is the name we want to have in CONT.
So far so good, just after arping is
trying to make sure no one is using the
IP to be set.
and arping is accessing file
/sys/class/net/eth0/broadcast
which doesn't exist --> Network setting hang!.
Fix: when "ip link set vth_name name othername"
is applied, /sys/class/net/ should be updated
by kernel too.
#3: Very critical / CONT can't be production grade.
HOST and CONT share the same kmsg ring buffer.
Some part of the kernel running as CONT
could printk CONT specific message (iptable
packet tracing is a good example) even worse
CONT:rsyslog is reading kmsg too, meaning
it is competing with HOST:rsyslog to get
critical information. So the whole ring buffer
is garbled (not good at all).
My advice is to give a specific "ring buffer"
to each started container. This is the way it
was implemented by the openvz guys (seems to
me a very good solution), other solution would
be to say CONT:/proc/kmsg to be a kind of
device null, but then how kernel will give to
container context, informations on it specific
CONT problem???
My 3 cents.
Seems to me we are very close to have a "production"
container, thanks to all contributor...
--
A bientôt
==========================================================================
Jean-Marc Pigeon Internet: jmp@safe.ca
SAFE Inc. Phone: (514) 493-4280
Fax: (514) 493-1946
Clement, 'a kiss solution' to get rid of SPAM (at last)
Clement' Home base <"http://www.clement.safe.ca">
==========================================================================
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 12+ messages in threadend of thread, other threads:[~2010-02-04 16:02 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1265074676.6260.212.camel@Mercier.safe.ca>
[not found] ` <20100202031647.GA14318@fqdn.org>
[not found] ` <1265121846.6260.231.camel@Mercier.safe.ca>
[not found] ` <4B68649D.2000503@free.fr>
[not found] ` <4B68649D.2000503-GANU6spQydw@public.gmane.org>
2010-02-02 18:18 ` Kernel 2.6.33-rc6, 3 bugs container specific Serge E. Hallyn
[not found] ` <20100202181801.GA28412-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-02 18:43 ` Jean-Marc Pigeon
[not found] ` <1265136215.6260.261.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org>
2010-02-02 21:32 ` Serge E. Hallyn
[not found] ` <20100202213254.GH32305-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-03 10:51 ` Daniel Lezcano
[not found] ` <4B695535.7020301-GANU6spQydw@public.gmane.org>
2010-02-03 13:24 ` Jean-Marc Pigeon
2010-02-03 15:03 ` Serge E. Hallyn
[not found] ` <20100203150350.GA7146-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-03 15:48 ` Jean-Marc Pigeon
[not found] ` <1265212090.6260.284.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org>
2010-02-03 16:21 ` Serge E. Hallyn
2010-02-04 9:33 ` Daniel Lezcano
[not found] ` <4B6A9461.1010309-GANU6spQydw@public.gmane.org>
2010-02-04 15:19 ` [Lxc-users] " Serge E. Hallyn
[not found] ` <20100204151927.GA7556-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-04 16:02 ` Cedric Le Goater
2010-02-02 14:46 Jean-Marc Pigeon
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.