All of lore.kernel.org
 help / color / mirror / Atom feed
* LXC L3 network isolation, yes/no ?, how ?
@ 2011-11-01  2:12 Toerless Eckert
       [not found] ` <20111101021230.GE15906-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Toerless Eckert @ 2011-11-01  2:12 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

I am trying to understand if (and if so how) i can use LXC (or any
other comparable lightweightc container option) to effectively
run applications on a linux system with two separate IP interfaces
as if they each had only access to a single IP interface.

Eg:
    eth0 with address and default-router learned by DHCP
    eg: address 10.1.1.2/24, default-router 10.1.1.254
    DNS prefix and DNS domain name for ether0 of course also learned by DHCP.

    eth1 with address and default-router learned by DHCP
    eg: address 10.2.1.a/242, default-router 10.2.1.254
    DNS prefix and DNS domain name for ether0 of course also learned by DHCP.

    (no need for overlapping addresses).

So, i configure LXC accordingly (how...) for one eth0container, and one
eth1container. All processes running eth0container will have all their
traffic use ony eth0, all the ones in eth1container will only use eth1.

If this works, i'd love to get a pointer to an example config. The
ones i could find on the web looked as if they where using bridging
to attach multiple containers to ultimately the same single IP subnet
with the same default router (and thereby the same DNS prefix and DNS servers).

I can't see how LXC can make my case work without some additional kernel
support because when either process1 or process2 open let say a
client socket and just connect(), then (AFAIK) the default linux routing
logic takes place which would (AFAIK) first figure out where to route the
destination to (eth0 or eth1) and then pick the local IP address of that
interface as the sockets local IP address. And i don't understand how
LXC would make this decision process dependent on which contain the process
is running in.

I guess one can create additional routing tables, one for each container
and then use the fwmark on all sockets to have them use that container
specific routing table, but it's not clear to me whether/how that is really
done on LXC.

Thanks a lot!
    Toerless

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LXC L3 network isolation, yes/no ?, how ?
       [not found] ` <20111101021230.GE15906-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
@ 2011-11-01  3:19   ` Eric W. Biederman
       [not found]     ` <m1r51swmun.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2011-11-01  3:19 UTC (permalink / raw)
  To: Toerless Eckert; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Toerless Eckert <Toerless.Eckert-vrlraubKdiR4tiELkoLHDcSSVFg4/55HhC4ANOJQIlc@public.gmane.org> writes:

> I am trying to understand if (and if so how) i can use LXC (or any
> other comparable lightweightc container option) to effectively
> run applications on a linux system with two separate IP interfaces
> as if they each had only access to a single IP interface.
>
> Eg:
>     eth0 with address and default-router learned by DHCP
>     eg: address 10.1.1.2/24, default-router 10.1.1.254
>     DNS prefix and DNS domain name for ether0 of course also learned by DHCP.
>
>     eth1 with address and default-router learned by DHCP
>     eg: address 10.2.1.a/242, default-router 10.2.1.254
>     DNS prefix and DNS domain name for ether0 of course also learned by DHCP.
>
>     (no need for overlapping addresses).

That sounds like L2 level isolation.

ip link set eth1 netns XXXX.

Will let move a network device to a choose network namespace.

That is the easy trivial case.  Most people don't have the multiple
physical interfaces so tricky things have to happen.

Does that sound like what you are looking for?

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LXC L3 network isolation, yes/no ?, how ?
       [not found]     ` <m1r51swmun.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2011-11-01  4:32       ` Toerless Eckert
       [not found]         ` <20111101043201.GA14734-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Toerless Eckert @ 2011-11-01  4:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Toerless Eckert

Thanks, Eric

How do i configure eg: an LXC container to use a specific network name space XXXX ?

Also: if an app within some LXC container does a socket() and then a 
bind(..INADDR_ANY...) how does the kernel know which subset of IP interfaces
it should bind to ? does the process context have a network name space ?

And how do i create per namespace routing tables ?

Example or pointer to docs would be great. or just walk me through the rough
outline of my use case...:

  - create container e0procs, configure just the physical eth0 interface into it ??
    - without assigning an IP address ?
    - run a dhcp daemon from withing container e0proces and that
      will correctly get ip address/mask and default route configured in a
      routing table solely used by container e0procs ?
    - container e0procs DHCPd will also populate containerized /etc/resolv.conf with
      eth0 domain prefix/DNS-servers...

  - same approach for container c1procs, confgiure phys eth1 interface into it,
    start DHCP daemon inside container inside it, get routing table and dNS
    for container c1procs from it.

Is that it ? Of not, then how. If yes, then what type of routing table would
i actually see outside of the containers ? And back to the original question,
would socket(), bind(INADDR_ANY) from inside the containers work correctly ?

Thanks
    Toerless

On Mon, Oct 31, 2011 at 08:19:44PM -0700, Eric W. Biederman wrote:
> Toerless Eckert <Toerless.Eckert-vrlraubKdiR4tiELkoLHDcSSVFg4/55HhC4ANOJQIlc@public.gmane.org> writes:
> 
> > I am trying to understand if (and if so how) i can use LXC (or any
> > other comparable lightweightc container option) to effectively
> > run applications on a linux system with two separate IP interfaces
> > as if they each had only access to a single IP interface.
> >
> > Eg:
> >     eth0 with address and default-router learned by DHCP
> >     eg: address 10.1.1.2/24, default-router 10.1.1.254
> >     DNS prefix and DNS domain name for ether0 of course also learned by DHCP.
> >
> >     eth1 with address and default-router learned by DHCP
> >     eg: address 10.2.1.a/242, default-router 10.2.1.254
> >     DNS prefix and DNS domain name for ether0 of course also learned by DHCP.
> >
> >     (no need for overlapping addresses).
> 
> That sounds like L2 level isolation.
> 
> ip link set eth1 netns XXXX.
> 
> Will let move a network device to a choose network namespace.
> 
> That is the easy trivial case.  Most people don't have the multiple
> physical interfaces so tricky things have to happen.
> 
> Does that sound like what you are looking for?
> 
> Eric
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LXC L3 network isolation, yes/no ?, how ?
       [not found]         ` <20111101043201.GA14734-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
@ 2011-11-01 12:20           ` Eric W. Biederman
       [not found]             ` <m1lis0vxu6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2011-11-01 12:20 UTC (permalink / raw)
  To: Toerless Eckert; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Toerless Eckert <Toerless.Eckert-vrlraubKdiR4tiELkoLHDcSSVFg4/55HhC4ANOJQIlc@public.gmane.org> writes:

> Thanks, Eric
>
> How do i configure eg: an LXC container to use a specific network name space XXXX ?
>
> Also: if an app within some LXC container does a socket() and then a 
> bind(..INADDR_ANY...) how does the kernel know which subset of IP interfaces
> it should bind to ? does the process context have a network name space
> ?

The network namespace.

> And how do i create per namespace routing tables ?

Just like nomral.  From inside the network namespace you setup your
routing tables.

> Example or pointer to docs would be great. or just walk me through the rough
> outline of my use case...:
>
>   - create container e0procs, configure just the physical eth0 interface into it ??
>     - without assigning an IP address ?
>     - run a dhcp daemon from withing container e0proces and that
>       will correctly get ip address/mask and default route configured in a
>       routing table solely used by container e0procs ?
>     - container e0procs DHCPd will also populate containerized /etc/resolv.conf with
>       eth0 domain prefix/DNS-servers...
>
>   - same approach for container c1procs, confgiure phys eth1 interface into it,
>     start DHCP daemon inside container inside it, get routing table and dNS
>     for container c1procs from it.
>
> Is that it ? Of not, then how. If yes, then what type of routing table would
> i actually see outside of the containers ? And back to the original question,
> would socket(), bind(INADDR_ANY) from inside the containers work correctly ?


Yes.  bind(INADDR_ANY) works correctly inside a network namespace.

A network namespace is from an application perspective like having a
separate copy of the networking stack.  

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LXC L3 network isolation, yes/no ?, how ?
       [not found]             ` <m1lis0vxu6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2011-11-01 15:26               ` Toerless Eckert
       [not found]                 ` <20111101152624.GB14734-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Toerless Eckert @ 2011-11-01 15:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Toerless Eckert

THanks for replying,

Sorry for asking what probably are a lot of naive questions, my excuse is
that the documentation is somewhat scattered/incomplete ? ;-))

I am trying to figure out how to minimize the virtualization to just the network
name space and instantiate it in a lightweight fashion that can easily
be counterfitted into some existing system. 

What i would like to have is some simple program like "run-ns XXXX <program> <args>"
that would run program <args> within namespace XXXX.

So i was looking for some system call like set_ns(XXXX), but it seems there
is no API like that. Instead i guess i would need to have a "server" process
with pid XXXX that does an unshare(CLONE_NEWNS) and then listens for requests
to fork client programs, and run-ns would need to send a request to that XXXX
process to fork off <program> <args> and make sure that it can transfer all
the pre-existing context of run-ns like pid/gid(s), cwd, environment, and i don't
even know all the other context a linux process has these days. And then of course
communicate exit status of <program> back from XXXX to run-ns.

Meaning: it's great to have something like network name spaces, but without
some setns(XXXX) system call, it's really difficult to use these network name
spaces outside of a concept like LXC - which is a shame, because otherwise
the nework name space woudl exactly be what i am looking for.

I guess i will have to look how much of an isolated network behvior i can
get by using fwmark's. Alas, there is no process-level fwmark context, but
it has to be set via setsockopt(SO_MARK) AFAIK, so one would need some
LD_PRELOAD library or the like to use it.

*sigh* ;-))

Cheers
    Toerless

On Tue, Nov 01, 2011 at 05:20:01AM -0700, Eric W. Biederman wrote:
> Toerless Eckert <Toerless.Eckert-vrlraubKdiR4tiELkoLHDcSSVFg4/55HhC4ANOJQIlc@public.gmane.org> writes:
> 
> > Thanks, Eric
> >
> > How do i configure eg: an LXC container to use a specific network name space XXXX ?
> >
> > Also: if an app within some LXC container does a socket() and then a 
> > bind(..INADDR_ANY...) how does the kernel know which subset of IP interfaces
> > it should bind to ? does the process context have a network name space
> > ?
> 
> The network namespace.
> 
> > And how do i create per namespace routing tables ?
> 
> Just like nomral.  From inside the network namespace you setup your
> routing tables.
> 
> > Example or pointer to docs would be great. or just walk me through the rough
> > outline of my use case...:
> >
> >   - create container e0procs, configure just the physical eth0 interface into it ??
> >     - without assigning an IP address ?
> >     - run a dhcp daemon from withing container e0proces and that
> >       will correctly get ip address/mask and default route configured in a
> >       routing table solely used by container e0procs ?
> >     - container e0procs DHCPd will also populate containerized /etc/resolv.conf with
> >       eth0 domain prefix/DNS-servers...
> >
> >   - same approach for container c1procs, confgiure phys eth1 interface into it,
> >     start DHCP daemon inside container inside it, get routing table and dNS
> >     for container c1procs from it.
> >
> > Is that it ? Of not, then how. If yes, then what type of routing table would
> > i actually see outside of the containers ? And back to the original question,
> > would socket(), bind(INADDR_ANY) from inside the containers work correctly ?
> 
> 
> Yes.  bind(INADDR_ANY) works correctly inside a network namespace.
> 
> A network namespace is from an application perspective like having a
> separate copy of the networking stack.  
> 
> Eric
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LXC L3 network isolation, yes/no ?, how ?
       [not found]                 ` <20111101152624.GB14734-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
@ 2011-11-01 15:55                   ` Daniel Lezcano
  2011-11-01 17:17                   ` Eric W. Biederman
  1 sibling, 0 replies; 9+ messages in thread
From: Daniel Lezcano @ 2011-11-01 15:55 UTC (permalink / raw)
  To: Toerless Eckert
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

On 11/01/2011 04:26 PM, Toerless Eckert wrote:
> THanks for replying,
>
> Sorry for asking what probably are a lot of naive questions, my excuse is
> that the documentation is somewhat scattered/incomplete ? ;-))
>
> I am trying to figure out how to minimize the virtualization to just the network
> name space and instantiate it in a lightweight fashion that can easily
> be counterfitted into some existing system.
>
> What i would like to have is some simple program like "run-ns XXXX<program>  <args>"
> that would run program<args>  within namespace XXXX.

Did you look at the lxc-execute command ?

http://lxc.sourceforge.net/man/lxc.html

the "Quick Start" section, third line.

   -- Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LXC L3 network isolation, yes/no ?, how ?
       [not found]                 ` <20111101152624.GB14734-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
  2011-11-01 15:55                   ` Daniel Lezcano
@ 2011-11-01 17:17                   ` Eric W. Biederman
       [not found]                     ` <m1hb2nsqy6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2011-11-01 17:17 UTC (permalink / raw)
  To: Toerless Eckert; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Toerless Eckert <Toerless.Eckert-vrlraubKdiR4tiELkoLHDcSSVFg4/55HhC4ANOJQIlc@public.gmane.org> writes:

> THanks for replying,
>
> Sorry for asking what probably are a lot of naive questions, my excuse is
> that the documentation is somewhat scattered/incomplete ? ;-))
>
> I am trying to figure out how to minimize the virtualization to just the network
> name space and instantiate it in a lightweight fashion that can easily
> be counterfitted into some existing system. 
>
> What i would like to have is some simple program like "run-ns XXXX <program> <args>"
> that would run program <args> within namespace XXXX.
>
> So i was looking for some system call like set_ns(XXXX), but it seems there
> is no API like that. Instead i guess i would need to have a "server" process
> with pid XXXX that does an unshare(CLONE_NEWNS) and then listens for requests
> to fork client programs, and run-ns would need to send a request to that XXXX
> process to fork off <program> <args> and make sure that it can transfer all
> the pre-existing context of run-ns like pid/gid(s), cwd, environment, and i don't
> even know all the other context a linux process has these days. And then of course
> communicate exit status of <program> back from XXXX to run-ns.
>
> Meaning: it's great to have something like network name spaces, but without
> some setns(XXXX) system call, it's really difficult to use these network name
> spaces outside of a concept like LXC - which is a shame, because otherwise
> the nework name space woudl exactly be what i am looking for.

Definitely old docs.

ip netns add
ip netns delete
ip netns exec

And yes there is a setns system call.

If you don't have that you have old bits.  All of that should be merged
and documented.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LXC L3 network isolation, yes/no ?, how ?
       [not found]                     ` <m1hb2nsqy6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2011-11-02 19:51                       ` Toerless Eckert
       [not found]                         ` <20111102195142.GC14734-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Toerless Eckert @ 2011-11-02 19:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Toerless Eckert

Cool. Although i would claim my bits are "current", and your bits
are "bleeding edge". Just found the iproute2 package that supports this
on my gentoo by getting the latest cvs version only... ;-)

The biggest issue seems to be that setns() is only in 3.0 linux kernels
as far as i can see. Have to check whether that's a possible version on the
systems where i need it.

But at least this is technically cool and makes these network name spaces
much more flexible useable (eg: inside and outside of LXC).

Cheers
    Toerless

On Tue, Nov 01, 2011 at 10:17:05AM -0700, Eric W. Biederman wrote:
> > some setns(XXXX) system call, it's really difficult to use these network name
> > spaces outside of a concept like LXC - which is a shame, because otherwise
> > the nework name space woudl exactly be what i am looking for.
> 
> Definitely old docs.
> 
> ip netns add
> ip netns delete
> ip netns exec
> 
> And yes there is a setns system call.
> 
> If you don't have that you have old bits.  All of that should be merged
> and documented.
> 
> Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: LXC L3 network isolation, yes/no ?, how ?
       [not found]                         ` <20111102195142.GC14734-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
@ 2011-11-02 20:11                           ` Renato Westphal
  0 siblings, 0 replies; 9+ messages in thread
From: Renato Westphal @ 2011-11-02 20:11 UTC (permalink / raw)
  To: Toerless Eckert
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

2011/11/2 Toerless Eckert <Toerless.Eckert-jNDFPZUTrfT6U6xlzOR6HsSSVFg4/55HhC4ANOJQIlc@public.gmane.org>:
> Cool. Although i would claim my bits are "current", and your bits
> are "bleeding edge". Just found the iproute2 package that supports this
> on my gentoo by getting the latest cvs version only... ;-)
>
> The biggest issue seems to be that setns() is only in 3.0 linux kernels
> as far as i can see. Have to check whether that's a possible version on the
> systems where i need it.

Backporting the setns syscall and related stuff to older linux kernels
is straightforward. I backported it to the 2.6.35.13 release and
everything is working fine. if you are interested let me know.

> But at least this is technically cool and makes these network name spaces
> much more flexible useable (eg: inside and outside of LXC).
>
> Cheers
>    Toerless
>
> On Tue, Nov 01, 2011 at 10:17:05AM -0700, Eric W. Biederman wrote:
>> > some setns(XXXX) system call, it's really difficult to use these network name
>> > spaces outside of a concept like LXC - which is a shame, because otherwise
>> > the nework name space woudl exactly be what i am looking for.
>>
>> Definitely old docs.
>>
>> ip netns add
>> ip netns delete
>> ip netns exec
>>
>> And yes there is a setns system call.
>>
>> If you don't have that you have old bits.  All of that should be merged
>> and documented.
>>
>> Eric
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
>



-- 
Renato Westphal

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-11-02 20:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-01  2:12 LXC L3 network isolation, yes/no ?, how ? Toerless Eckert
     [not found] ` <20111101021230.GE15906-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
2011-11-01  3:19   ` Eric W. Biederman
     [not found]     ` <m1r51swmun.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2011-11-01  4:32       ` Toerless Eckert
     [not found]         ` <20111101043201.GA14734-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
2011-11-01 12:20           ` Eric W. Biederman
     [not found]             ` <m1lis0vxu6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2011-11-01 15:26               ` Toerless Eckert
     [not found]                 ` <20111101152624.GB14734-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
2011-11-01 15:55                   ` Daniel Lezcano
2011-11-01 17:17                   ` Eric W. Biederman
     [not found]                     ` <m1hb2nsqy6.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2011-11-02 19:51                       ` Toerless Eckert
     [not found]                         ` <20111102195142.GC14734-+4JsuViRYHWM0MU9lROt9PpTrGXM5HoexJJUWDj/nkeELgA04lAiVw@public.gmane.org>
2011-11-02 20:11                           ` Renato Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.