All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel 2.6.33-rc6, 3 bugs container specific.
@ 2010-02-02 14:46 Jean-Marc Pigeon
  0 siblings, 0 replies; 12+ messages in thread
From: Jean-Marc Pigeon @ 2010-02-02 14:46 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello,

        Tried 2.6.33-rc6 to check container, 3 bugs show up.
        (test done on x86_64, Pentium(R) Dual-Core CPU E5400)

        #1: Critical / fixed?:
        Already reported: system hang very badly if you start
        a container (clone) while cloneflag is set with at least 
        one of the set:
        CLONE_NEWNET|CLONE_NEWIPC|CLONE_NEWNS|CLONE_NEWPID|CLONE_NEWUTS.
        
        Bug is said fixed: 
        (commit  fabf318e5e4bda0aca2b0d617b191884fda62703),
        and is somewhere in queue, hopefully will be part of rc7.


        #2: Trouble  / can be override by sys_admin
         arping not working if HOST interface not named
         the same as in CONT. 

            Lets say you set the HOST "eth0" interface to be
            "fast" to met whatever your standard are and
            rename CONT veth to be eth0 using command:
            ip link set vth_name name eth0
            (within CONT) to allow very standard CONT template.

           directory HOST:/sys/class/net will report
           br0  fast lo  sit0 'to-vth'

           directory CONT:/sys/class/net will report
           exactly the same

           Problem: file
           /etc/sysconfig/network-scripts/ifup-eth
           is doing "ip link set dev eth0 up" as
           eth0 is the name we want to have in CONT.
           So far so good, just after arping is
           trying to make sure no one is using the
           IP to be set.
           and arping is accessing file
           /sys/class/net/eth0/broadcast
           which doesn't exist --> Network setting hang!.

           Fix: when "ip link set vth_name name othername"
           is applied, /sys/class/net/ should be updated
           by kernel too.


        #3: Very critical / CONT can't be production grade.
            HOST and CONT share the same kmsg ring buffer.

            Some part of the kernel running as CONT
            could printk CONT specific message (iptable
            packet tracing is a good example) even worse
            CONT:rsyslog is reading kmsg too, meaning
            it is competing with HOST:rsyslog to get
            critical information. So the whole ring buffer
            is garbled (not good at all).

            My advice is to give a specific "ring buffer"
            to each started container. This is the way it
            was implemented by the openvz guys (seems to
            me a very good solution), other solution would
            be to say CONT:/proc/kmsg to be a kind of
            device null, but then how kernel will give to
            container context, informations on it specific 
            CONT problem???


        My 3 cents.
        Seems to me we are very close to have a "production"
        container, thanks to all contributor...
-- 
A bientôt
==========================================================================
Jean-Marc Pigeon                                   Internet: jmp@safe.ca
SAFE Inc.                                          Phone: (514) 493-4280
                                                   Fax:   (514) 493-1946
        Clement, 'a kiss solution' to get rid of SPAM (at last)
           Clement' Home base <"http://www.clement.safe.ca">
==========================================================================

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.33-rc6, 3 bugs container specific.
       [not found]       ` <4B68649D.2000503-GANU6spQydw@public.gmane.org>
@ 2010-02-02 18:18         ` Serge E. Hallyn
       [not found]           ` <20100202181801.GA28412-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Serge E. Hallyn @ 2010-02-02 18:18 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
> Jean-Marc Pigeon wrote:
> > Hello,
> >
> > On Tue, 2010-02-02 at 04:16 +0100, Michael Holzt wrote:
> >   
> >>> 	Tried 2.6.33-rc6 to check container, 3 bugs show up.
> >>> 	(test done on x86_64, Pentium(R) Dual-Core CPU E5400)
> >>>       
> >> I guess this should better go on the containers mailing list,
> >> as this are kernel related problems?
> >>     
> >
> > 	Yes, you are right....I'll do it.
> >
> > 	Problem with /proc/kmsg (bug #3) is a very real
> > 	concern, Daniel Lezcano proposed a solution
> > 	(using fuse), but I think this solution is
> > 	just a patch (container sys-admin can override 
> > 	it, putting the whole system in total jeopardy).
> >
> > 	Seems kernel team is very reluctant to make 
> > 	the K ring buffer virtual but I see no other
> > 	solution (used already in openVZ).
> >   
> 
> Maybe I missed something, but AFAIR Serge Hallyn was willing to do this, 
> no ?
> Or there was a nack from someone ?

I was wondering out loud about the best design to solve his problem.

If we try to redirect kernel-generated messages to containers, we have
several problems, including whether we need to duplicate the messages
to the host container.  So in one sense it seems more flexible to
	1. send everything to host syslog
	2. clamp down on syslog use by processes not in the init_user_ns
	3. let the userspace on the host copy messages into a socket or
	   file so child container can pretend it has real syslog.

-serge

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.33-rc6, 3 bugs container specific.
       [not found]           ` <20100202181801.GA28412-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-02-02 18:43             ` Jean-Marc Pigeon
       [not found]               ` <1265136215.6260.261.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Jean-Marc Pigeon @ 2010-02-02 18:43 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hello,


> 
> I was wondering out loud about the best design to solve his problem.
> 
> If we try to redirect kernel-generated messages to containers, we have
> several problems, including whether we need to duplicate the messages
> to the host container.  So in one sense it seems more flexible to
> 	1. send everything to host syslog
		No, if we do that all CONTs message will reach
		the same bucket and it will be difficult to sort
		them out..
		CONT sys_admin and HOST sys_admin could be different
		"entity", so you debug CONT config and critical
		needed information reach HOST (which you do not 
		have access to).
> 	2. clamp down on syslog use by processes not in the init_user_ns
		Could give me more detail??...
> 	3. let the userspace on the host copy messages into a socket or
> 	   file so child container can pretend it has real syslog.

		So you trap printk message from CONT on the HOST and 
		redirect them on CONT but on a standard syslog channel.
		Seem OK to me, as long /proc/kmsg is not existing
		(/dev/null) in the CONT file tree.
		
-- 
A bientôt
==========================================================================
Jean-Marc Pigeon                                   Internet: jmp@safe.ca
SAFE Inc.                                          Phone: (514) 493-4280
                                                   Fax:   (514) 493-1946
        Clement, 'a kiss solution' to get rid of SPAM (at last)
           Clement' Home base <"http://www.clement.safe.ca">
==========================================================================


------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Lxc-users mailing list
Lxc-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-users

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.33-rc6, 3 bugs container specific.
       [not found]               ` <1265136215.6260.261.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org>
@ 2010-02-02 21:32                 ` Serge E. Hallyn
       [not found]                   ` <20100202213254.GH32305-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Serge E. Hallyn @ 2010-02-02 21:32 UTC (permalink / raw)
  To: Jean-Marc Pigeon
  Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Quoting Jean-Marc Pigeon (jmp-4qkeo2rQ0gg@public.gmane.org):
> Hello,
> 
> 
> > 
> > I was wondering out loud about the best design to solve his problem.
> > 
> > If we try to redirect kernel-generated messages to containers, we have
> > several problems, including whether we need to duplicate the messages
> > to the host container.  So in one sense it seems more flexible to
> > 	1. send everything to host syslog
> 		No, if we do that all CONTs message will reach
> 		the same bucket and it will be difficult to sort
> 		them out..
> 		CONT sys_admin and HOST sys_admin could be different
> 		"entity", so you debug CONT config and critical
> 		needed information reach HOST (which you do not 
> 		have access to).

Yes, so a privileged task on HOST must pass that information back to
you on CONT.  That is not a valid complaint imo.  But how to sort the
msgs out is a valid question.

We need some sort of identifier, unique system-wide, attached to.. something.
Is ifindex unique system-wide right now?  Oh, IIRC it is, but we wnat it to
be containerized, so that would be a bad choice :)

> > 	2. clamp down on syslog use by processes not in the init_user_ns
> 		Could give me more detail??...

Simplest choices would be to just refuse sys_syslog() and open(/proc/kmsg)
altogether from a container, or to only allow reading/writing messages
to own syslog.  (I had hoped to find time to try out the second option but
simply haven't had the time, and it doesn't look like I will very soon.
So if anyone else wants to, pls jump at it...)

Then /proc/kmsg can provide what I described above through a FUSE file,
and if, as you mentioned, the container unmounts the FUSE fs and gets
to real procfs, they just get nothing.

> > 	3. let the userspace on the host copy messages into a socket or
> > 	   file so child container can pretend it has real syslog.
> 
> 		So you trap printk message from CONT on the HOST and 
> 		redirect them on CONT but on a standard syslog channel.
> 		Seem OK to me, as long /proc/kmsg is not existing
> 		(/dev/null) in the CONT file tree.
> 		
> -- 
> A bientôt
> ==========================================================================
> Jean-Marc Pigeon                                   Internet: jmp-4qkeo2rQ0gg@public.gmane.org
> SAFE Inc.                                          Phone: (514) 493-4280
>                                                    Fax:   (514) 493-1946
>         Clement, 'a kiss solution' to get rid of SPAM (at last)
>            Clement' Home base <"http://www.clement.safe.ca">
> ==========================================================================

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.33-rc6, 3 bugs container specific.
       [not found]                   ` <20100202213254.GH32305-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-02-03 10:51                     ` Daniel Lezcano
       [not found]                       ` <4B695535.7020301-GANU6spQydw@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Lezcano @ 2010-02-03 10:51 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Serge E. Hallyn wrote:
> Quoting Jean-Marc Pigeon (jmp-4qkeo2rQ0gg@public.gmane.org):
>> Hello,
>>
>>
>>> I was wondering out loud about the best design to solve his problem.
>>>
>>> If we try to redirect kernel-generated messages to containers, we have
>>> several problems, including whether we need to duplicate the messages
>>> to the host container.  So in one sense it seems more flexible to
>>> 	1. send everything to host syslog
>> 		No, if we do that all CONTs message will reach
>> 		the same bucket and it will be difficult to sort
>> 		them out..
>> 		CONT sys_admin and HOST sys_admin could be different
>> 		"entity", so you debug CONT config and critical
>> 		needed information reach HOST (which you do not 
>> 		have access to).
> 
> Yes, so a privileged task on HOST must pass that information back to
> you on CONT.  That is not a valid complaint imo.  But how to sort the
> msgs out is a valid question.
> 
> We need some sort of identifier, unique system-wide, attached to.. something.
> Is ifindex unique system-wide right now?  Oh, IIRC it is, but we wnat it to
> be containerized, so that would be a bad choice :)
> 
>>> 	2. clamp down on syslog use by processes not in the init_user_ns
>> 		Could give me more detail??...
> 
> Simplest choices would be to just refuse sys_syslog() and open(/proc/kmsg)
> altogether from a container, or to only allow reading/writing messages
> to own syslog.  (I had hoped to find time to try out the second option but
> simply haven't had the time, and it doesn't look like I will very soon.
> So if anyone else wants to, pls jump at it...)
> 
> Then /proc/kmsg can provide what I described above through a FUSE file,
> and if, as you mentioned, the container unmounts the FUSE fs and gets
> to real procfs, they just get nothing.
> 
>>> 	3. let the userspace on the host copy messages into a socket or
>>> 	   file so child container can pretend it has real syslog.
>> 		So you trap printk message from CONT on the HOST and 
>> 		redirect them on CONT but on a standard syslog channel.
>> 		Seem OK to me, as long /proc/kmsg is not existing
>> 		(/dev/null) in the CONT file tree.


We have:
        * Commands to sys_syslog:
        *
        *      0 -- Close the log.  Currently a NOP.
        *      1 -- Open the log. Currently a NOP.
        *      2 -- Read from the log.
        *      3 -- Read all messages remaining in the ring buffer.
        *      4 -- Read and clear all messages remaining in the ring buffer
        *      5 -- Clear ring buffer.
        *      6 -- Disable printk to console
        *      7 -- Enable printk to console
        *      8 -- Set level of messages printed to console
        *      9 -- Return number of unread characters in the log buffer
        *     10 -- Return size of the log buffer

And add:
       *     11 -- create a new ring buffer for the current process and 
its childs


We have, let's say a global ring buffer keep untouched, used by 
syslog(2) and printk. When we create a new ring buffer, we allocate it 
and assign to the nsproxy (global ring buffer is the default in the 
nsproxy).

The prink keeps writing in the global ring buffer and the syslog(2) 
writes to the "namespaced" ring buffer.

Does it makes sense ?

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.33-rc6, 3 bugs container specific.
       [not found]                       ` <4B695535.7020301-GANU6spQydw@public.gmane.org>
@ 2010-02-03 13:24                         ` Jean-Marc Pigeon
  2010-02-03 15:03                         ` Serge E. Hallyn
  1 sibling, 0 replies; 12+ messages in thread
From: Jean-Marc Pigeon @ 2010-02-03 13:24 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hello
[..]

> 
> 
> We have:
>         * Commands to sys_syslog:
>         *
>         *      0 -- Close the log.  Currently a NOP.
>         *      1 -- Open the log. Currently a NOP.
>         *      2 -- Read from the log.
>         *      3 -- Read all messages remaining in the ring buffer.
>         *      4 -- Read and clear all messages remaining in the ring buffer
>         *      5 -- Clear ring buffer.
>         *      6 -- Disable printk to console
>         *      7 -- Enable printk to console
>         *      8 -- Set level of messages printed to console
>         *      9 -- Return number of unread characters in the log buffer
>         *     10 -- Return size of the log buffer
> 
> And add:
>        *     11 -- create a new ring buffer for the current process and 
> its childs
> 
> 
> We have, let's say a global ring buffer keep untouched, used by 
> syslog(2) and printk. When we create a new ring buffer, we allocate it 
> and assign to the nsproxy (global ring buffer is the default in the 
> nsproxy).
> 
> The prink keeps writing in the global ring buffer and the syslog(2) 
> writes to the "namespaced" ring buffer.
> 
> Does it makes sense ?
	I like this idea, as it give us flexibility.

	Caution: iptable packet log is using printk (AFIK)
	to do tracing. iptable can be used within CONT:
	above the iptable within HOST: (ip filtering superset). 
	So CONT:printk must be writing too in the 
	"namespaced" ring buffer.

-- 
A bientôt
==========================================================================
Jean-Marc Pigeon                                   Internet: jmp@safe.ca
SAFE Inc.                                          Phone: (514) 493-4280
                                                   Fax:   (514) 493-1946
        Clement, 'a kiss solution' to get rid of SPAM (at last)
           Clement' Home base <"http://www.clement.safe.ca">
==========================================================================


------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Lxc-users mailing list
Lxc-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-users

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.33-rc6, 3 bugs container specific.
       [not found]                       ` <4B695535.7020301-GANU6spQydw@public.gmane.org>
  2010-02-03 13:24                         ` Jean-Marc Pigeon
@ 2010-02-03 15:03                         ` Serge E. Hallyn
       [not found]                           ` <20100203150350.GA7146-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Serge E. Hallyn @ 2010-02-03 15:03 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
> Serge E. Hallyn wrote:
> >Quoting Jean-Marc Pigeon (jmp-4qkeo2rQ0gg@public.gmane.org):
> >>Hello,
> >>
> >>
> >>>I was wondering out loud about the best design to solve his problem.
> >>>
> >>>If we try to redirect kernel-generated messages to containers, we have
> >>>several problems, including whether we need to duplicate the messages
> >>>to the host container.  So in one sense it seems more flexible to
> >>>	1. send everything to host syslog
> >>		No, if we do that all CONTs message will reach
> >>		the same bucket and it will be difficult to sort
> >>		them out..
> >>		CONT sys_admin and HOST sys_admin could be different
> >>		"entity", so you debug CONT config and critical
> >>		needed information reach HOST (which you do not 		have access
> >>to).
> >
> >Yes, so a privileged task on HOST must pass that information back to
> >you on CONT.  That is not a valid complaint imo.  But how to sort the
> >msgs out is a valid question.
> >
> >We need some sort of identifier, unique system-wide, attached to.. something.
> >Is ifindex unique system-wide right now?  Oh, IIRC it is, but we wnat it to
> >be containerized, so that would be a bad choice :)
> >
> >>>	2. clamp down on syslog use by processes not in the init_user_ns
> >>		Could give me more detail??...
> >
> >Simplest choices would be to just refuse sys_syslog() and open(/proc/kmsg)
> >altogether from a container, or to only allow reading/writing messages
> >to own syslog.  (I had hoped to find time to try out the second option but
> >simply haven't had the time, and it doesn't look like I will very soon.
> >So if anyone else wants to, pls jump at it...)
> >
> >Then /proc/kmsg can provide what I described above through a FUSE file,
> >and if, as you mentioned, the container unmounts the FUSE fs and gets
> >to real procfs, they just get nothing.
> >
> >>>	3. let the userspace on the host copy messages into a socket or
> >>>	   file so child container can pretend it has real syslog.
> >>		So you trap printk message from CONT on the HOST and
> >>		redirect them on CONT but on a standard syslog channel.
> >>		Seem OK to me, as long /proc/kmsg is not existing
> >>		(/dev/null) in the CONT file tree.
> 
> 
> We have:
>        * Commands to sys_syslog:
>        *
>        *      0 -- Close the log.  Currently a NOP.
>        *      1 -- Open the log. Currently a NOP.
>        *      2 -- Read from the log.
>        *      3 -- Read all messages remaining in the ring buffer.
>        *      4 -- Read and clear all messages remaining in the ring buffer
>        *      5 -- Clear ring buffer.
>        *      6 -- Disable printk to console
>        *      7 -- Enable printk to console
>        *      8 -- Set level of messages printed to console
>        *      9 -- Return number of unread characters in the log buffer
>        *     10 -- Return size of the log buffer
> 
> And add:
>       *     11 -- create a new ring buffer for the current process
> and its childs
> 
> 
> We have, let's say a global ring buffer keep untouched, used by
> syslog(2) and printk. When we create a new ring buffer, we allocate
> it and assign to the nsproxy (global ring buffer is the default in
> the nsproxy).
> 
> The prink keeps writing in the global ring buffer and the syslog(2)
> writes to the "namespaced" ring buffer.
> 
> Does it makes sense ?

Yeah, it's a nice alternative.  Though (1) there is something to be said for
forcing a new ring buffer upon clone(CLONE_NEWUSER), and (2) assuming the
new ring buffer is pointed to from nsproxy, it might be frowned upon to do
an unshare/clone action in yet another way.

I still think our first concern should be safety, and that we should consider
just adding 'struct syslog_struct' to nsproxy, and making that NULL on a
clone(CLONE_NEWUSER).  any sys_syslog() or /proc/kmsg access returns -EINVAL
after that.  Then we can discuss whether and how to target printks to
namespaces, and whether duplicates should be sent to parent namespaces.

After we start getting flexible with syslog, the next request will be for
audit flexibility.  I don't even know how our netlink support suffices for
that right now.

(So, this all does turn into a big deal...)

-serge

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.33-rc6, 3 bugs container specific.
       [not found]                           ` <20100203150350.GA7146-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-02-03 15:48                             ` Jean-Marc Pigeon
       [not found]                               ` <1265212090.6260.284.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org>
  2010-02-04  9:33                             ` Daniel Lezcano
  1 sibling, 1 reply; 12+ messages in thread
From: Jean-Marc Pigeon @ 2010-02-03 15:48 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hello,

[...]
> > 
> > The prink keeps writing in the global ring buffer and the syslog(2)
> > writes to the "namespaced" ring buffer.
> > 
> > Does it makes sense ?
> 
> Yeah, it's a nice alternative.  Though (1) there is something to be said for
> forcing a new ring buffer upon clone(CLONE_NEWUSER), and (2) assuming the
> new ring buffer is pointed to from nsproxy, it might be frowned upon to do
> an unshare/clone action in yet another way.
> 
> I still think our first concern should be safety, and that we should consider
> just adding 'struct syslog_struct' to nsproxy, and making that NULL on a
> clone(CLONE_NEWUSER).  any sys_syslog() or /proc/kmsg access returns -EINVAL
> after that.  Then we can discuss whether and how to target printks to
> namespaces, and whether duplicates should be sent to parent namespaces.
	/proc/kmsg=-EINVAL  will resolve the own HOST: ring buffer corruption
	not sure what sys_syslog()=-EINVAL mean???, rsyslog MUST be able to
	run within CONT: right?

	printk namespaces duplicate and sent to parent namespace
	is not a good idea (duplicating&forwarding is done by tools as rsyslogd).
> 
> After we start getting flexible with syslog, the next request will be for
> audit flexibility.  I don't even know how our netlink support suffices for
> that right now.
> 
> (So, this all does turn into a big deal...)
> 
> -serge
-- 
A bientôt
==========================================================================
Jean-Marc Pigeon                                   Internet: jmp@safe.ca
SAFE Inc.                                          Phone: (514) 493-4280
                                                   Fax:   (514) 493-1946
        Clement, 'a kiss solution' to get rid of SPAM (at last)
           Clement' Home base <"http://www.clement.safe.ca">
==========================================================================


------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Lxc-users mailing list
Lxc-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-users

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.33-rc6, 3 bugs container specific.
       [not found]                               ` <1265212090.6260.284.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org>
@ 2010-02-03 16:21                                 ` Serge E. Hallyn
  0 siblings, 0 replies; 12+ messages in thread
From: Serge E. Hallyn @ 2010-02-03 16:21 UTC (permalink / raw)
  To: Jean-Marc Pigeon
  Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Quoting Jean-Marc Pigeon (jmp-4qkeo2rQ0gg@public.gmane.org):
> Hello,
> 
> [...]
> > > 
> > > The prink keeps writing in the global ring buffer and the syslog(2)
> > > writes to the "namespaced" ring buffer.
> > > 
> > > Does it makes sense ?
> > 
> > Yeah, it's a nice alternative.  Though (1) there is something to be said for
> > forcing a new ring buffer upon clone(CLONE_NEWUSER), and (2) assuming the
> > new ring buffer is pointed to from nsproxy, it might be frowned upon to do
> > an unshare/clone action in yet another way.
> > 
> > I still think our first concern should be safety, and that we should consider
> > just adding 'struct syslog_struct' to nsproxy, and making that NULL on a
> > clone(CLONE_NEWUSER).  any sys_syslog() or /proc/kmsg access returns -EINVAL
> > after that.  Then we can discuss whether and how to target printks to
> > namespaces, and whether duplicates should be sent to parent namespaces.
> 	/proc/kmsg=-EINVAL  will resolve the own HOST: ring buffer corruption
> 	not sure what sys_syslog()=-EINVAL mean???, rsyslog MUST be able to
> 	run within CONT: right?
> 
> 	printk namespaces duplicate and sent to parent namespace
> 	is not a good idea (duplicating&forwarding is done by tools as rsyslogd).

Heh, agreed, I just disagree that we should trust CONT to fwd stuff to
HOST, rather we should have HOST fwd stuff to CONT.

It comes down to whether HOST might need the info to determine what CONT
or someone attacking CONT is up to.  To the extend that HOST will never be
fully safe from CONT (and if you think it is, even using kvm/vmware, then
you you're deluding yourself) I think it's misguided to keep information
from HOST.

> > After we start getting flexible with syslog, the next request will be for
> > audit flexibility.  I don't even know how our netlink support suffices for
> > that right now.
> > 
> > (So, this all does turn into a big deal...)
> > 
> > -serge
> -- 
> A bientôt
> ==========================================================================
> Jean-Marc Pigeon                                   Internet: jmp-4qkeo2rQ0gg@public.gmane.org
> SAFE Inc.                                          Phone: (514) 493-4280
>                                                    Fax:   (514) 493-1946
>         Clement, 'a kiss solution' to get rid of SPAM (at last)
>            Clement' Home base <"http://www.clement.safe.ca">
> ==========================================================================

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.33-rc6, 3 bugs container specific.
       [not found]                           ` <20100203150350.GA7146-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2010-02-03 15:48                             ` Jean-Marc Pigeon
@ 2010-02-04  9:33                             ` Daniel Lezcano
       [not found]                               ` <4B6A9461.1010309-GANU6spQydw@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Lezcano @ 2010-02-04  9:33 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Serge E. Hallyn wrote:
> Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
>   
>> Serge E. Hallyn wrote:
>>     
>>> Quoting Jean-Marc Pigeon (jmp-4qkeo2rQ0gg@public.gmane.org):
>>>       
>>>> Hello,
>>>>
>>>>
>>>>         
>>>>> I was wondering out loud about the best design to solve his problem.
>>>>>
>>>>> If we try to redirect kernel-generated messages to containers, we have
>>>>> several problems, including whether we need to duplicate the messages
>>>>> to the host container.  So in one sense it seems more flexible to
>>>>> 	1. send everything to host syslog
>>>>>           
>>>> 		No, if we do that all CONTs message will reach
>>>> 		the same bucket and it will be difficult to sort
>>>> 		them out..
>>>> 		CONT sys_admin and HOST sys_admin could be different
>>>> 		"entity", so you debug CONT config and critical
>>>> 		needed information reach HOST (which you do not 		have access
>>>> to).
>>>>         
>>> Yes, so a privileged task on HOST must pass that information back to
>>> you on CONT.  That is not a valid complaint imo.  But how to sort the
>>> msgs out is a valid question.
>>>
>>> We need some sort of identifier, unique system-wide, attached to.. something.
>>> Is ifindex unique system-wide right now?  Oh, IIRC it is, but we wnat it to
>>> be containerized, so that would be a bad choice :)
>>>
>>>       
>>>>> 	2. clamp down on syslog use by processes not in the init_user_ns
>>>>>           
>>>> 		Could give me more detail??...
>>>>         
>>> Simplest choices would be to just refuse sys_syslog() and open(/proc/kmsg)
>>> altogether from a container, or to only allow reading/writing messages
>>> to own syslog.  (I had hoped to find time to try out the second option but
>>> simply haven't had the time, and it doesn't look like I will very soon.
>>> So if anyone else wants to, pls jump at it...)
>>>
>>> Then /proc/kmsg can provide what I described above through a FUSE file,
>>> and if, as you mentioned, the container unmounts the FUSE fs and gets
>>> to real procfs, they just get nothing.
>>>
>>>       
>>>>> 	3. let the userspace on the host copy messages into a socket or
>>>>> 	   file so child container can pretend it has real syslog.
>>>>>           
>>>> 		So you trap printk message from CONT on the HOST and
>>>> 		redirect them on CONT but on a standard syslog channel.
>>>> 		Seem OK to me, as long /proc/kmsg is not existing
>>>> 		(/dev/null) in the CONT file tree.
>>>>         
>> We have:
>>        * Commands to sys_syslog:
>>        *
>>        *      0 -- Close the log.  Currently a NOP.
>>        *      1 -- Open the log. Currently a NOP.
>>        *      2 -- Read from the log.
>>        *      3 -- Read all messages remaining in the ring buffer.
>>        *      4 -- Read and clear all messages remaining in the ring buffer
>>        *      5 -- Clear ring buffer.
>>        *      6 -- Disable printk to console
>>        *      7 -- Enable printk to console
>>        *      8 -- Set level of messages printed to console
>>        *      9 -- Return number of unread characters in the log buffer
>>        *     10 -- Return size of the log buffer
>>
>> And add:
>>       *     11 -- create a new ring buffer for the current process
>> and its childs
>>
>>
>> We have, let's say a global ring buffer keep untouched, used by
>> syslog(2) and printk. When we create a new ring buffer, we allocate
>> it and assign to the nsproxy (global ring buffer is the default in
>> the nsproxy).
>>
>> The prink keeps writing in the global ring buffer and the syslog(2)
>> writes to the "namespaced" ring buffer.
>>
>> Does it makes sense ?
>>     
>
> Yeah, it's a nice alternative.  Though (1) there is something to be said for
> forcing a new ring buffer upon clone(CLONE_NEWUSER), and (2) assuming the
> new ring buffer is pointed to from nsproxy, it might be frowned upon to do
> an unshare/clone action in yet another way.
>   
Why do you want to tie clone(CLONE_NEWUSER) with a new ring buffer ?
I mean one may want to use CLONE_NEWUSER but keep the ring buffer, no ?
> I still think our first concern should be safety, and that we should consider
> just adding 'struct syslog_struct' to nsproxy, and making that NULL on a
> clone(CLONE_NEWUSER).  any sys_syslog() or /proc/kmsg access returns -EINVAL
> after that.  Then we can discuss whether and how to target printks to
> namespaces, and whether duplicates should be sent to parent namespaces.
>   
That makes sense to do it step by step. Targeting the printk is the more 
difficult, no ? I mean you should have always the destination namespace 
available which is not obvious when the printk is called from an 
interrupt context.

> After we start getting flexible with syslog, the next request will be for
> audit flexibility.  I don't even know how our netlink support suffices for
> that right now.
>
> (So, this all does turn into a big deal...)
>   
Mmh ... right.

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Lxc-users] Kernel 2.6.33-rc6, 3 bugs container specific.
       [not found]                               ` <4B6A9461.1010309-GANU6spQydw@public.gmane.org>
@ 2010-02-04 15:19                                 ` Serge E. Hallyn
       [not found]                                   ` <20100204151927.GA7556-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Serge E. Hallyn @ 2010-02-04 15:19 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
> Serge E. Hallyn wrote:
> >Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
> >>Serge E. Hallyn wrote:
> >>>Quoting Jean-Marc Pigeon (jmp-4qkeo2rQ0gg@public.gmane.org):
> >>>>Hello,
> >>>>
> >>>>
> >>>>>I was wondering out loud about the best design to solve his problem.
> >>>>>
> >>>>>If we try to redirect kernel-generated messages to containers, we have
> >>>>>several problems, including whether we need to duplicate the messages
> >>>>>to the host container.  So in one sense it seems more flexible to
> >>>>>	1. send everything to host syslog
> >>>>		No, if we do that all CONTs message will reach
> >>>>		the same bucket and it will be difficult to sort
> >>>>		them out..
> >>>>		CONT sys_admin and HOST sys_admin could be different
> >>>>		"entity", so you debug CONT config and critical
> >>>>		needed information reach HOST (which you do not 		have access
> >>>>to).
> >>>Yes, so a privileged task on HOST must pass that information back to
> >>>you on CONT.  That is not a valid complaint imo.  But how to sort the
> >>>msgs out is a valid question.
> >>>
> >>>We need some sort of identifier, unique system-wide, attached to.. something.
> >>>Is ifindex unique system-wide right now?  Oh, IIRC it is, but we wnat it to
> >>>be containerized, so that would be a bad choice :)
> >>>
> >>>>>	2. clamp down on syslog use by processes not in the init_user_ns
> >>>>		Could give me more detail??...
> >>>Simplest choices would be to just refuse sys_syslog() and open(/proc/kmsg)
> >>>altogether from a container, or to only allow reading/writing messages
> >>>to own syslog.  (I had hoped to find time to try out the second option but
> >>>simply haven't had the time, and it doesn't look like I will very soon.
> >>>So if anyone else wants to, pls jump at it...)
> >>>
> >>>Then /proc/kmsg can provide what I described above through a FUSE file,
> >>>and if, as you mentioned, the container unmounts the FUSE fs and gets
> >>>to real procfs, they just get nothing.
> >>>
> >>>>>	3. let the userspace on the host copy messages into a socket or
> >>>>>	   file so child container can pretend it has real syslog.
> >>>>		So you trap printk message from CONT on the HOST and
> >>>>		redirect them on CONT but on a standard syslog channel.
> >>>>		Seem OK to me, as long /proc/kmsg is not existing
> >>>>		(/dev/null) in the CONT file tree.
> >>We have:
> >>       * Commands to sys_syslog:
> >>       *
> >>       *      0 -- Close the log.  Currently a NOP.
> >>       *      1 -- Open the log. Currently a NOP.
> >>       *      2 -- Read from the log.
> >>       *      3 -- Read all messages remaining in the ring buffer.
> >>       *      4 -- Read and clear all messages remaining in the ring buffer
> >>       *      5 -- Clear ring buffer.
> >>       *      6 -- Disable printk to console
> >>       *      7 -- Enable printk to console
> >>       *      8 -- Set level of messages printed to console
> >>       *      9 -- Return number of unread characters in the log buffer
> >>       *     10 -- Return size of the log buffer
> >>
> >>And add:
> >>      *     11 -- create a new ring buffer for the current process
> >>and its childs
> >>
> >>
> >>We have, let's say a global ring buffer keep untouched, used by
> >>syslog(2) and printk. When we create a new ring buffer, we allocate
> >>it and assign to the nsproxy (global ring buffer is the default in
> >>the nsproxy).
> >>
> >>The prink keeps writing in the global ring buffer and the syslog(2)
> >>writes to the "namespaced" ring buffer.
> >>
> >>Does it makes sense ?
> >
> >Yeah, it's a nice alternative.  Though (1) there is something to be said for
> >forcing a new ring buffer upon clone(CLONE_NEWUSER), and (2) assuming the
> >new ring buffer is pointed to from nsproxy, it might be frowned upon to do
> >an unshare/clone action in yet another way.

> Why do you want to tie clone(CLONE_NEWUSER) with a new ring buffer ?
> I mean one may want to use CLONE_NEWUSER but keep the ring buffer, no ?

Hmm, well yesterday I was thinking no, but I guess you're right.  I may
be wanting to remap userids and not contain root.

I still like your syslog command 11, but assuming we want to keep the
syslog_ns on nsproxy, I think we really need to stick to clone/unshare.
So if we want to add a CLONE_SYSLOG flag, we have to wait until eclone
gets us more clone flags :)  Or, pull out the eclone patchset from
linux-cr and make it prereq for this.

> >I still think our first concern should be safety, and that we should consider
> >just adding 'struct syslog_struct' to nsproxy, and making that NULL on a
> >clone(CLONE_NEWUSER).  any sys_syslog() or /proc/kmsg access returns -EINVAL
> >after that.  Then we can discuss whether and how to target printks to
> >namespaces, and whether duplicates should be sent to parent namespaces.
> That makes sense to do it step by step. Targeting the printk is the
> more difficult, no ? I mean you should have always the destination
> namespace available which is not obvious when the printk is called
> from an interrupt context.
> 
> >After we start getting flexible with syslog, the next request will be for
> >audit flexibility.  I don't even know how our netlink support suffices for
> >that right now.
> >
> >(So, this all does turn into a big deal...)
> Mmh ... right.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.33-rc6, 3 bugs container specific.
       [not found]                                   ` <20100204151927.GA7556-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-02-04 16:02                                     ` Cedric Le Goater
  0 siblings, 0 replies; 12+ messages in thread
From: Cedric Le Goater @ 2010-02-04 16:02 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On 02/04/2010 04:19 PM, Serge E. Hallyn wrote:
> I still like your syslog command 11, but assuming we want to keep the
> syslog_ns on nsproxy, I think we really need to stick to clone/unshare.

Yes. let's keep the namespace creation API consistent. this is complex
enough.

C.

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-02-04 16:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1265074676.6260.212.camel@Mercier.safe.ca>
     [not found] ` <20100202031647.GA14318@fqdn.org>
     [not found]   ` <1265121846.6260.231.camel@Mercier.safe.ca>
     [not found]     ` <4B68649D.2000503@free.fr>
     [not found]       ` <4B68649D.2000503-GANU6spQydw@public.gmane.org>
2010-02-02 18:18         ` Kernel 2.6.33-rc6, 3 bugs container specific Serge E. Hallyn
     [not found]           ` <20100202181801.GA28412-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-02 18:43             ` Jean-Marc Pigeon
     [not found]               ` <1265136215.6260.261.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org>
2010-02-02 21:32                 ` Serge E. Hallyn
     [not found]                   ` <20100202213254.GH32305-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-03 10:51                     ` Daniel Lezcano
     [not found]                       ` <4B695535.7020301-GANU6spQydw@public.gmane.org>
2010-02-03 13:24                         ` Jean-Marc Pigeon
2010-02-03 15:03                         ` Serge E. Hallyn
     [not found]                           ` <20100203150350.GA7146-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-03 15:48                             ` Jean-Marc Pigeon
     [not found]                               ` <1265212090.6260.284.camel-4BUXZ/Ty1v7iqR6jatDSCA@public.gmane.org>
2010-02-03 16:21                                 ` Serge E. Hallyn
2010-02-04  9:33                             ` Daniel Lezcano
     [not found]                               ` <4B6A9461.1010309-GANU6spQydw@public.gmane.org>
2010-02-04 15:19                                 ` [Lxc-users] " Serge E. Hallyn
     [not found]                                   ` <20100204151927.GA7556-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-04 16:02                                     ` Cedric Le Goater
2010-02-02 14:46 Jean-Marc Pigeon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.