Re: [PATCH V4 3/8] namespaces: expose ns instance serial numbers in proc

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH V4 3/8] namespaces: expose ns instance serial numbers in proc
       [not found]         ` <CALCETrUkFD0iNi1SV_6ypN5Kf4GYybT5tzjRjRQuLzT9iBnQAg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-08-25 13:30           ` Nicolas Dichtel
       [not found]             ` <53FB3A86.2060203-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Dichtel @ 2014-08-25 13:30 UTC (permalink / raw)
  To: Andy Lutomirski, Richard Guy Briggs
  Cc: Serge E. Hallyn, Linux API, Linux Containers,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, netdev

Le 24/08/2014 19:52, Andy Lutomirski a écrit :
> On Thu, Aug 21, 2014 at 6:58 PM, Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> On 14/08/21, Andy Lutomirski wrote:
>>> On Aug 20, 2014 8:12 PM, "Richard Guy Briggs" <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>>> Expose the namespace instace serial numbers in the proc filesystem at
>>>> /proc/<pid>/ns/<ns>_snum.  The link text gives the serial number in hex.
>>>
>>> What's the use case?
>>>
>>> I understand the utility of giving unique numbers to the audit code,
>>> but I don't think this part is necessary for that, and I'd like to
>>> understand what else will use this before committing to a duplicative
>>> API like this.
>>
>> How does a container manager get those numbers?  It could provoke a task
>> to cause an audit event that emits a NS_INFO message, or it could run a
>> task in that container to report its namespace serial numbers directly
>> from its /proc mount.
>
> Why does a container manager need them?  Is there any reason that
> keeping them entirely contained within the audit system would be a
> problem?
>
>>
>> The discussion in this thread touches on the use cases:
>>          https://lkml.org/lkml/2014/4/22/662
>>
>>> Note that this API is thoroughly incompatible with CRIU.  If we do
>>> this, someone will ask for a namespace number namespace, and that way
>>> lies madness.
>>
>> I had a very brief look at CRIU, but not enough to understand the issue.
>> Others have hinted at this problem.
>>
>> Do you have a suggestion of a different approach that would be
>> compatible with CRIU?
>>
>> I'd originally considered some sort of UUID that would be globally
>> unique, but that would be very hard to devise or guarantee, and besides,
>> namespaces aren't only used by containers and could be shared in other
>> ways.  Tracking the usage and migration of namespaces should be the task
>> of an upper layer.
>>
>
> CRIU wants to save the complete state of a namespace and then restore
> it.  For that to work, any information exposed to things in the
> namespace *cannot* be globally unique or unique per boot, since CRIU
> needs to arrange for that information to match whatever it was when
> CRIU saved it.
How are ifindex of network devices managed? These ifindexes are unique per boot,
thus can change depending on the order in which netdev are created.
These ifindexes are unique per boot and exposed to userspace ...

>
> Also, I think that code running in a namespace has no business even
> knowing a unique identity of that namespace from the perspective of
> the host.
Another scenario is when you have virtual network devices across two netns. You
need to identify the peer netns to have a netlink message which is fully 
interpretable by the userspace.


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4 3/8] namespaces: expose ns instance serial numbers in proc
       [not found]             ` <53FB3A86.2060203-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
@ 2014-08-25 14:04               ` Andy Lutomirski
       [not found]                 ` <CALCETrW1Lv0qeccMjNHSEzgtiaNN3NgJVR1dFjjR_dw5KVVnqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Lutomirski @ 2014-08-25 14:04 UTC (permalink / raw)
  To: nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w
  Cc: Linux API, Linux Containers,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, netdev

On Aug 25, 2014 6:30 AM, "Nicolas Dichtel" <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> wrote:
>> CRIU wants to save the complete state of a namespace and then restore
>> it.  For that to work, any information exposed to things in the
>> namespace *cannot* be globally unique or unique per boot, since CRIU
>> needs to arrange for that information to match whatever it was when
>> CRIU saved it.
>
> How are ifindex of network devices managed? These ifindexes are unique per boot,
> thus can change depending on the order in which netdev are created.
> These ifindexes are unique per boot and exposed to userspace ...
>

This does not appear to be true.

$ sudo unshare --net
# ip link add veth0 type veth peer name veth1
# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether 06:0d:59:c7:a6:a8 brd ff:ff:ff:ff:ff:ff
3: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether b2:5c:8b:f2:12:28 brd ff:ff:ff:ff:ff:ff
# logout
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: em1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
state DOWN qlen 1000

>
>>
>> Also, I think that code running in a namespace has no business even
>> knowing a unique identity of that namespace from the perspective of
>> the host.
>
> Another scenario is when you have virtual network devices across two netns. You
> need to identify the peer netns to have a netlink message which is fully interpretable by the userspace.

Let me try again, with emphasis in the right place.

I think that *code running in a namespace* has no business even
knowing a unique identity of *that namespace* from the perspective of
the host.

In your example, if there's a veth device between netns A and netns B,
then code *in netns A* has no business knowing the identity of its
veth peer if its peer (B) is a sibling or ancestor.  It also IMO has
no business knowing the identity of its own netns (A) other than as
"my netns".

If A and B are siblings, then their parent needs to know where that
veth device goes, but I think this is already the case to a sufficient
extent today.

I feel like this discussion is falling into a common trap of new API
discussions.  Can one of you who wants this API please articulate,
with a reasonably precise example, what it is that you want to do, why
you can't easily do it already, and how this API helps?  I currently
understand how the API creates problems, but I don't understand how it
solves any problems, and I will NAK it (and I suspect that Eric will,
too, which is pretty much fatal) unless that changes.

Thanks,
Andy

>
>
> Regards,
> Nicolas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4 3/8] namespaces: expose ns instance serial numbers in proc
       [not found]                 ` <CALCETrW1Lv0qeccMjNHSEzgtiaNN3NgJVR1dFjjR_dw5KVVnqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-08-25 15:43                   ` Nicolas Dichtel
       [not found]                     ` <53FB59A3.5030804-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Dichtel @ 2014-08-25 15:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Serge E. Hallyn, Eric W. Biederman,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Richard Guy Briggs,
	netdev

Le 25/08/2014 16:04, Andy Lutomirski a écrit :
> On Aug 25, 2014 6:30 AM, "Nicolas Dichtel" <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> wrote:
>>> CRIU wants to save the complete state of a namespace and then restore
>>> it.  For that to work, any information exposed to things in the
>>> namespace *cannot* be globally unique or unique per boot, since CRIU
>>> needs to arrange for that information to match whatever it was when
>>> CRIU saved it.
>>
>> How are ifindex of network devices managed? These ifindexes are unique per boot,
>> thus can change depending on the order in which netdev are created.
>> These ifindexes are unique per boot and exposed to userspace ...
>>
>
> This does not appear to be true.
>
> $ sudo unshare --net
> # ip link add veth0 type veth peer name veth1
> # ip link
> 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
> DEFAULT group default qlen 1000
>      link/ether 06:0d:59:c7:a6:a8 brd ff:ff:ff:ff:ff:ff
> 3: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
> DEFAULT group default qlen 1000
>      link/ether b2:5c:8b:f2:12:28 brd ff:ff:ff:ff:ff:ff
> # logout
> $ ip link
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 3: em1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
> state DOWN qlen 1000
>
I've probably misunderstood what you're trying to say. ifindexes are unique per
boot and per netns. These ifindexes depend on the interface creation order:

$ ip netns add 1
$ ip link set eth1 netns 1
$ ip netns exec 1 ip link add veth0 type veth peer name veth1
$ ip netns exec 1 ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
     link/ether 9a:a0:89:99:a0:3c brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group 
default qlen 1000
     link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
4: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
     link/ether 96:86:44:49:ce:a8 brd ff:ff:ff:ff:ff:ff
$ ip netns del 1
$ ip netns add 1
$ ip netns exec 1 ip link add veth0 type veth peer name veth1
$ ip link set eth1 netns 1
$ ip netns exec 1 ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
     link/ether 86:92:90:01:32:6b brd ff:ff:ff:ff:ff:ff
3: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
     link/ether ae:8b:d2:71:48:a2 brd ff:ff:ff:ff:ff:ff
4: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group 
default qlen 1000
     link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff

Note: when an interface is moved to another netns, the ifindex is kept if
possible, else another ifindex is chosen.
I will dig a bit to understand how CRIU save these netns informations.

>>
>>>
>>> Also, I think that code running in a namespace has no business even
>>> knowing a unique identity of that namespace from the perspective of
>>> the host.
>>
>> Another scenario is when you have virtual network devices across two netns. You
>> need to identify the peer netns to have a netlink message which is fully interpretable by the userspace.
>
> Let me try again, with emphasis in the right place.
>
> I think that *code running in a namespace* has no business even
> knowing a unique identity of *that namespace* from the perspective of
> the host.
>
> In your example, if there's a veth device between netns A and netns B,
> then code *in netns A* has no business knowing the identity of its
> veth peer if its peer (B) is a sibling or ancestor.  It also IMO has
> no business knowing the identity of its own netns (A) other than as
> "my netns".
I do not agree (see the example below).

>
> If A and B are siblings, then their parent needs to know where that
> veth device goes, but I think this is already the case to a sufficient
> extent today.
I'm not aware of a hierarchy between netns. A daemon should be able to
got the full network configuration, even if it's started when this configuration
is already applied, ie even if it doesn't know what happen before it starts.

>
> I feel like this discussion is falling into a common trap of new API
> discussions.  Can one of you who wants this API please articulate,
> with a reasonably precise example, what it is that you want to do, why
> you can't easily do it already, and how this API helps?  I currently
> understand how the API creates problems, but I don't understand how it
> solves any problems, and I will NAK it (and I suspect that Eric will,
> too, which is pretty much fatal) unless that changes.
What I'm trying to solve is to have full info in netlink messages sent by the
kernel, thus beeing able to identify a peer netns (and this is close from what
audit guys are trying to have). Theorically, messages sent by the kernel can be
reused as is to have the same configuration. This is not the case with x-netns
devices. Here is an example, with ip tunnels:

$ ip netns add 1
$ ip link add ipip1 type ipip remote 10.16.0.121 local 10.16.0.249 dev eth0
$ ip -d link ls ipip1
8: ipip1@eth0: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT 
group default
     link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0
     ipip remote 10.16.0.121 local 10.16.0.249 dev eth0 ttl inherit pmtudisc
$ ip link set ipip1 netns 1
$ ip netns exec 1 ip -d link ls ipip1
8: ipip1@tunl0: <POINTOPOINT,NOARP,M-DOWN> mtu 1480 qdisc noop state DOWN mode 
DEFAULT group default
     link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0
     ipip remote 10.16.0.121 local 10.16.0.249 dev tunl0 ttl inherit pmtudisc

Now informations got with 'ip link' are wrong and incomplete:
  - the link dev is now tunl0 instead of eth0, because we only got an ifindex
    from the kernel without any netns informations.
  - the encapsulation addresses are not part of this netns but the user doesn't
    known that (still because netns info is missing). These IPv4 addresses may
    exist into this netns.
  - it's not possible to create the same netdevice with these infos.


Hope it's more clear now.


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4 3/8] namespaces: expose ns instance serial numbers in proc
       [not found]                     ` <53FB59A3.5030804-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
@ 2014-08-25 16:13                       ` Andy Lutomirski
       [not found]                         ` <CALCETrWHrWhm89B5s=pLt_9eTx3ZF8ifA6y6CwknWaWU7dp=sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Lutomirski @ 2014-08-25 16:13 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: Linux API, Linux Containers,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, netdev

On Mon, Aug 25, 2014 at 8:43 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> Le 25/08/2014 16:04, Andy Lutomirski a écrit :
>
>> On Aug 25, 2014 6:30 AM, "Nicolas Dichtel" <nicolas.dichtel@6wind.com>
>> wrote:
>>>>
>>>> CRIU wants to save the complete state of a namespace and then restore
>>>> it.  For that to work, any information exposed to things in the
>>>> namespace *cannot* be globally unique or unique per boot, since CRIU
>>>> needs to arrange for that information to match whatever it was when
>>>> CRIU saved it.
>>>
>>>
>>> How are ifindex of network devices managed? These ifindexes are unique
>>> per boot,
>>> thus can change depending on the order in which netdev are created.
>>> These ifindexes are unique per boot and exposed to userspace ...
>>>
>>
>> This does not appear to be true.
>>
>> $ sudo unshare --net
>> # ip link add veth0 type veth peer name veth1
>> # ip link
>> 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group
>> default
>>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>> 2: veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
>> DEFAULT group default qlen 1000
>>      link/ether 06:0d:59:c7:a6:a8 brd ff:ff:ff:ff:ff:ff
>> 3: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
>> DEFAULT group default qlen 1000
>>      link/ether b2:5c:8b:f2:12:28 brd ff:ff:ff:ff:ff:ff
>> # logout
>> $ ip link
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>> 3: em1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
>> state DOWN qlen 1000
>>
> I've probably misunderstood what you're trying to say. ifindexes are unique
> per
> boot and per netns.

I think we both misunderstood each other.  The ifindexes are unique
*per netns*, which means that, if you're unprivileged in a netns,
global information doesn't leak to you.  I think this is good.

>>
>> Let me try again, with emphasis in the right place.
>>
>> I think that *code running in a namespace* has no business even
>> knowing a unique identity of *that namespace* from the perspective of
>> the host.
>>
>> In your example, if there's a veth device between netns A and netns B,
>> then code *in netns A* has no business knowing the identity of its
>> veth peer if its peer (B) is a sibling or ancestor.  It also IMO has
>> no business knowing the identity of its own netns (A) other than as
>> "my netns".
>
> I do not agree (see the example below).
>
>
>>
>> If A and B are siblings, then their parent needs to know where that
>> veth device goes, but I think this is already the case to a sufficient
>> extent today.
>
> I'm not aware of a hierarchy between netns. A daemon should be able to
> got the full network configuration, even if it's started when this
> configuration
> is already applied, ie even if it doesn't know what happen before it starts.
>

I don't know exactly which namespaces have an explicit hierarchy, but
there is certainly a hierarchy of *user* namespaces, and network
namespaces live in user namespaces, so they at least have somewhat of
a hierarchy.

>
>>
>> I feel like this discussion is falling into a common trap of new API
>> discussions.  Can one of you who wants this API please articulate,
>> with a reasonably precise example, what it is that you want to do, why
>> you can't easily do it already, and how this API helps?  I currently
>> understand how the API creates problems, but I don't understand how it
>> solves any problems, and I will NAK it (and I suspect that Eric will,
>> too, which is pretty much fatal) unless that changes.
>
> What I'm trying to solve is to have full info in netlink messages sent by
> the
> kernel, thus beeing able to identify a peer netns (and this is close from
> what
> audit guys are trying to have). Theorically, messages sent by the kernel can
> be
> reused as is to have the same configuration. This is not the case with
> x-netns
> devices. Here is an example, with ip tunnels:
>
> $ ip netns add 1
> $ ip link add ipip1 type ipip remote 10.16.0.121 local 10.16.0.249 dev eth0
> $ ip -d link ls ipip1
> 8: ipip1@eth0: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode
> DEFAULT group default
>     link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0
>     ipip remote 10.16.0.121 local 10.16.0.249 dev eth0 ttl inherit pmtudisc
> $ ip link set ipip1 netns 1
> $ ip netns exec 1 ip -d link ls ipip1
> 8: ipip1@tunl0: <POINTOPOINT,NOARP,M-DOWN> mtu 1480 qdisc noop state DOWN
> mode DEFAULT group default
>     link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0
>     ipip remote 10.16.0.121 local 10.16.0.249 dev tunl0 ttl inherit pmtudisc
>
> Now informations got with 'ip link' are wrong and incomplete:
>  - the link dev is now tunl0 instead of eth0, because we only got an ifindex
>    from the kernel without any netns informations.
>  - the encapsulation addresses are not part of this netns but the user
> doesn't
>    known that (still because netns info is missing). These IPv4 addresses
> may
>    exist into this netns.
>  - it's not possible to create the same netdevice with these infos.
>

Aha.  That's a genuine problem.

Perhaps we need a concept of which netnses should be able to see each other.

I think I would be okay with a somewhat different outcome from your example:

$ ip netns exec 1 ip -d link ls ipip1
8: ipip1@[unknown device in another namespace]:
<POINTOPOINT,NOARP,M-DOWN> mtu 1480 qdisc noop state DOWN

I think this outcome is mandatory if netns 1 lives in a subsidiary
user namespace.

Certainly, if you do the 'ip link' in the original namespace, I agree
that this should work.

For most namespace types, this all works transparently, since
everything has an real identity all the way up the hierarchy.  Network
namespaces are different.

I don't think that exposing serial numbers in /proc is a good
solution, both for the reasons already described and because I don't
think that iproute2 should need to muck around with /proc to function
correctly.  Eric, any clever ideas here?  Do we need fancier netlink
messages for this?

--Andy
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4 3/8] namespaces: expose ns instance serial numbers in proc
       [not found]                         ` <CALCETrWHrWhm89B5s=pLt_9eTx3ZF8ifA6y6CwknWaWU7dp=sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-08-25 16:41                           ` Nicolas Dichtel
       [not found]                             ` <53FB673F.8070200-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Dichtel @ 2014-08-25 16:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Serge E. Hallyn, Eric W. Biederman,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Richard Guy Briggs,
	netdev

Le 25/08/2014 18:13, Andy Lutomirski a écrit :
> On Mon, Aug 25, 2014 at 8:43 AM, Nicolas Dichtel
> <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> wrote:
>> Le 25/08/2014 16:04, Andy Lutomirski a écrit :
>>
>>> On Aug 25, 2014 6:30 AM, "Nicolas Dichtel" <nicolas.dichtel@6wind.com>
>>> wrote:
>>>>>
>>>>> CRIU wants to save the complete state of a namespace and then restore
>>>>> it.  For that to work, any information exposed to things in the
>>>>> namespace *cannot* be globally unique or unique per boot, since CRIU
>>>>> needs to arrange for that information to match whatever it was when
>>>>> CRIU saved it.
>>>>
>>>>
>>>> How are ifindex of network devices managed? These ifindexes are unique
>>>> per boot,
>>>> thus can change depending on the order in which netdev are created.
>>>> These ifindexes are unique per boot and exposed to userspace ...
>>>>
>>>
>>> This does not appear to be true.
>>>
>>> $ sudo unshare --net
>>> # ip link add veth0 type veth peer name veth1
>>> # ip link
>>> 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group
>>> default
>>>       link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>> 2: veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
>>> DEFAULT group default qlen 1000
>>>       link/ether 06:0d:59:c7:a6:a8 brd ff:ff:ff:ff:ff:ff
>>> 3: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
>>> DEFAULT group default qlen 1000
>>>       link/ether b2:5c:8b:f2:12:28 brd ff:ff:ff:ff:ff:ff
>>> # logout
>>> $ ip link
>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>>>       link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>> 3: em1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
>>> state DOWN qlen 1000
>>>
>> I've probably misunderstood what you're trying to say. ifindexes are unique
>> per
>> boot and per netns.
>
> I think we both misunderstood each other.  The ifindexes are unique
> *per netns*, which means that, if you're unprivileged in a netns,
> global information doesn't leak to you.  I think this is good.
Ok, I agree. I think audit daemons are always running under privileged users.

>
>>>
>>> Let me try again, with emphasis in the right place.
>>>
>>> I think that *code running in a namespace* has no business even
>>> knowing a unique identity of *that namespace* from the perspective of
>>> the host.
>>>
>>> In your example, if there's a veth device between netns A and netns B,
>>> then code *in netns A* has no business knowing the identity of its
>>> veth peer if its peer (B) is a sibling or ancestor.  It also IMO has
>>> no business knowing the identity of its own netns (A) other than as
>>> "my netns".
>>
>> I do not agree (see the example below).
>>
>>
>>>
>>> If A and B are siblings, then their parent needs to know where that
>>> veth device goes, but I think this is already the case to a sufficient
>>> extent today.
>>
>> I'm not aware of a hierarchy between netns. A daemon should be able to
>> got the full network configuration, even if it's started when this
>> configuration
>> is already applied, ie even if it doesn't know what happen before it starts.
>>
>
> I don't know exactly which namespaces have an explicit hierarchy, but
> there is certainly a hierarchy of *user* namespaces, and network
> namespaces live in user namespaces, so they at least have somewhat of
> a hierarchy.
>
>>
>>>
>>> I feel like this discussion is falling into a common trap of new API
>>> discussions.  Can one of you who wants this API please articulate,
>>> with a reasonably precise example, what it is that you want to do, why
>>> you can't easily do it already, and how this API helps?  I currently
>>> understand how the API creates problems, but I don't understand how it
>>> solves any problems, and I will NAK it (and I suspect that Eric will,
>>> too, which is pretty much fatal) unless that changes.
>>
>> What I'm trying to solve is to have full info in netlink messages sent by
>> the
>> kernel, thus beeing able to identify a peer netns (and this is close from
>> what
>> audit guys are trying to have). Theorically, messages sent by the kernel can
>> be
>> reused as is to have the same configuration. This is not the case with
>> x-netns
>> devices. Here is an example, with ip tunnels:
>>
>> $ ip netns add 1
>> $ ip link add ipip1 type ipip remote 10.16.0.121 local 10.16.0.249 dev eth0
>> $ ip -d link ls ipip1
>> 8: ipip1@eth0: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode
>> DEFAULT group default
>>      link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0
>>      ipip remote 10.16.0.121 local 10.16.0.249 dev eth0 ttl inherit pmtudisc
>> $ ip link set ipip1 netns 1
>> $ ip netns exec 1 ip -d link ls ipip1
>> 8: ipip1@tunl0: <POINTOPOINT,NOARP,M-DOWN> mtu 1480 qdisc noop state DOWN
>> mode DEFAULT group default
>>      link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0
>>      ipip remote 10.16.0.121 local 10.16.0.249 dev tunl0 ttl inherit pmtudisc
>>
>> Now informations got with 'ip link' are wrong and incomplete:
>>   - the link dev is now tunl0 instead of eth0, because we only got an ifindex
>>     from the kernel without any netns informations.
>>   - the encapsulation addresses are not part of this netns but the user
>> doesn't
>>     known that (still because netns info is missing). These IPv4 addresses
>> may
>>     exist into this netns.
>>   - it's not possible to create the same netdevice with these infos.
>>
>
> Aha.  That's a genuine problem.
>
> Perhaps we need a concept of which netnses should be able to see each other.
Yes, I agree. This is not required for all netns, only a subset of netns should
be able to see each other.

>
> I think I would be okay with a somewhat different outcome from your example:
>
> $ ip netns exec 1 ip -d link ls ipip1
> 8: ipip1@[unknown device in another namespace]:
> <POINTOPOINT,NOARP,M-DOWN> mtu 1480 qdisc noop state DOWN
>
> I think this outcome is mandatory if netns 1 lives in a subsidiary
> user namespace.
Yes.

>
> Certainly, if you do the 'ip link' in the original namespace, I agree
> that this should work.
And yes :)

I will update my previous proposal 
(http://thread.gmane.org/gmane.linux.network/315933/focus=321753)
to allow to get an id for a peer netns only when the user namespace is the same.

>
> For most namespace types, this all works transparently, since
> everything has an real identity all the way up the hierarchy.  Network
> namespaces are different.
>
> I don't think that exposing serial numbers in /proc is a good
> solution, both for the reasons already described and because I don't
> think that iproute2 should need to muck around with /proc to function
A netlink API is probably enough. But it will help only for the network
problem, not for audit. I was hoping to find a common solution.

> correctly.  Eric, any clever ideas here?  Do we need fancier netlink
> messages for this?
>
> --Andy
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4 3/8] namespaces: expose ns instance serial numbers in proc
       [not found]                             ` <53FB673F.8070200-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
@ 2014-08-25 16:50                               ` Andy Lutomirski
  2014-08-27 15:17                                 ` Richard Guy Briggs
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Lutomirski @ 2014-08-25 16:50 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: Linux Containers,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Serge E. Hallyn, Eric W. Biederman,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Richard Guy Briggs,
	netdev

On Mon, Aug 25, 2014 at 9:41 AM, Nicolas Dichtel
<nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> wrote:
> Le 25/08/2014 18:13, Andy Lutomirski a écrit :
>
>> On Mon, Aug 25, 2014 at 8:43 AM, Nicolas Dichtel
>> <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> wrote:
>>>
>>> Le 25/08/2014 16:04, Andy Lutomirski a écrit :
>>>
>>>> On Aug 25, 2014 6:30 AM, "Nicolas Dichtel" <nicolas.dichtel@6wind.com>
>>>> wrote:
>>>>>>
>>>>>>
>>>>>> CRIU wants to save the complete state of a namespace and then restore
>>>>>> it.  For that to work, any information exposed to things in the
>>>>>> namespace *cannot* be globally unique or unique per boot, since CRIU
>>>>>> needs to arrange for that information to match whatever it was when
>>>>>> CRIU saved it.
>>>>>
>>>>>
>>>>>
>>>>> How are ifindex of network devices managed? These ifindexes are unique
>>>>> per boot,
>>>>> thus can change depending on the order in which netdev are created.
>>>>> These ifindexes are unique per boot and exposed to userspace ...
>>>>>
>>>>
>>>> This does not appear to be true.
>>>>
>>>> $ sudo unshare --net
>>>> # ip link add veth0 type veth peer name veth1
>>>> # ip link
>>>> 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group
>>>> default
>>>>       link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>> 2: veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
>>>> DEFAULT group default qlen 1000
>>>>       link/ether 06:0d:59:c7:a6:a8 brd ff:ff:ff:ff:ff:ff
>>>> 3: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
>>>> DEFAULT group default qlen 1000
>>>>       link/ether b2:5c:8b:f2:12:28 brd ff:ff:ff:ff:ff:ff
>>>> # logout
>>>> $ ip link
>>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>>>>       link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>> 3: em1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
>>>> state DOWN qlen 1000
>>>>
>>> I've probably misunderstood what you're trying to say. ifindexes are
>>> unique
>>> per
>>> boot and per netns.
>>
>>
>> I think we both misunderstood each other.  The ifindexes are unique
>> *per netns*, which means that, if you're unprivileged in a netns,
>> global information doesn't leak to you.  I think this is good.
>
> Ok, I agree. I think audit daemons are always running under privileged
> users.
>
>
>>
>>>>
>>>> Let me try again, with emphasis in the right place.
>>>>
>>>> I think that *code running in a namespace* has no business even
>>>> knowing a unique identity of *that namespace* from the perspective of
>>>> the host.
>>>>
>>>> In your example, if there's a veth device between netns A and netns B,
>>>> then code *in netns A* has no business knowing the identity of its
>>>> veth peer if its peer (B) is a sibling or ancestor.  It also IMO has
>>>> no business knowing the identity of its own netns (A) other than as
>>>> "my netns".
>>>
>>>
>>> I do not agree (see the example below).
>>>
>>>
>>>>
>>>> If A and B are siblings, then their parent needs to know where that
>>>> veth device goes, but I think this is already the case to a sufficient
>>>> extent today.
>>>
>>>
>>> I'm not aware of a hierarchy between netns. A daemon should be able to
>>> got the full network configuration, even if it's started when this
>>> configuration
>>> is already applied, ie even if it doesn't know what happen before it
>>> starts.
>>>
>>
>> I don't know exactly which namespaces have an explicit hierarchy, but
>> there is certainly a hierarchy of *user* namespaces, and network
>> namespaces live in user namespaces, so they at least have somewhat of
>> a hierarchy.
>>
>>>
>>>>
>>>> I feel like this discussion is falling into a common trap of new API
>>>> discussions.  Can one of you who wants this API please articulate,
>>>> with a reasonably precise example, what it is that you want to do, why
>>>> you can't easily do it already, and how this API helps?  I currently
>>>> understand how the API creates problems, but I don't understand how it
>>>> solves any problems, and I will NAK it (and I suspect that Eric will,
>>>> too, which is pretty much fatal) unless that changes.
>>>
>>>
>>> What I'm trying to solve is to have full info in netlink messages sent by
>>> the
>>> kernel, thus beeing able to identify a peer netns (and this is close from
>>> what
>>> audit guys are trying to have). Theorically, messages sent by the kernel
>>> can
>>> be
>>> reused as is to have the same configuration. This is not the case with
>>> x-netns
>>> devices. Here is an example, with ip tunnels:
>>>
>>> $ ip netns add 1
>>> $ ip link add ipip1 type ipip remote 10.16.0.121 local 10.16.0.249 dev
>>> eth0
>>> $ ip -d link ls ipip1
>>> 8: ipip1@eth0: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode
>>> DEFAULT group default
>>>      link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0
>>>      ipip remote 10.16.0.121 local 10.16.0.249 dev eth0 ttl inherit
>>> pmtudisc
>>> $ ip link set ipip1 netns 1
>>> $ ip netns exec 1 ip -d link ls ipip1
>>> 8: ipip1@tunl0: <POINTOPOINT,NOARP,M-DOWN> mtu 1480 qdisc noop state DOWN
>>> mode DEFAULT group default
>>>      link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0
>>>      ipip remote 10.16.0.121 local 10.16.0.249 dev tunl0 ttl inherit
>>> pmtudisc
>>>
>>> Now informations got with 'ip link' are wrong and incomplete:
>>>   - the link dev is now tunl0 instead of eth0, because we only got an
>>> ifindex
>>>     from the kernel without any netns informations.
>>>   - the encapsulation addresses are not part of this netns but the user
>>> doesn't
>>>     known that (still because netns info is missing). These IPv4
>>> addresses
>>> may
>>>     exist into this netns.
>>>   - it's not possible to create the same netdevice with these infos.
>>>
>>
>> Aha.  That's a genuine problem.
>>
>> Perhaps we need a concept of which netnses should be able to see each
>> other.
>
> Yes, I agree. This is not required for all netns, only a subset of netns
> should
>
> be able to see each other.
>
>>
>> I think I would be okay with a somewhat different outcome from your
>> example:
>>
>> $ ip netns exec 1 ip -d link ls ipip1
>> 8: ipip1@[unknown device in another namespace]:
>> <POINTOPOINT,NOARP,M-DOWN> mtu 1480 qdisc noop state DOWN
>>
>> I think this outcome is mandatory if netns 1 lives in a subsidiary
>> user namespace.
>
> Yes.
>
>
>>
>> Certainly, if you do the 'ip link' in the original namespace, I agree
>> that this should work.
>
> And yes :)
>
> I will update my previous proposal
> (http://thread.gmane.org/gmane.linux.network/315933/focus=321753)
> to allow to get an id for a peer netns only when the user namespace is the
> same.
>

I think it should work if the peer userns is the same or a descendent.
I also wonder whether the peer's ifindex should be suppressed if peer
userns is not the same or a descendent.

Now you just have to get Eric to be happy with the id allocation. :)
This may be nontrivial.

>
>>
>> For most namespace types, this all works transparently, since
>> everything has an real identity all the way up the hierarchy.  Network
>> namespaces are different.
>>
>> I don't think that exposing serial numbers in /proc is a good
>> solution, both for the reasons already described and because I don't
>> think that iproute2 should need to muck around with /proc to function
>
> A netlink API is probably enough. But it will help only for the network
> problem, not for audit. I was hoping to find a common solution.

I still don't understand why audit needs anything beyond the audit
part of this patch set.  I have no problem with audit seeing that
migrated/restored namespaces are really brand-new namespaces, as long
as the code in those namespaces isn't exposed to it.

>
>
>> correctly.  Eric, any clever ideas here?  Do we need fancier netlink
>> messages for this?
>>
>> --Andy
>>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4 3/8] namespaces: expose ns instance serial numbers in proc
  2014-08-25 16:50                               ` Andy Lutomirski
@ 2014-08-27 15:17                                 ` Richard Guy Briggs
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Guy Briggs @ 2014-08-27 15:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux API, Linux Containers, linux-kernel@vger.kernel.org,
	linux-audit, Eric W. Biederman, netdev, Serge E. Hallyn

On 14/08/25, Andy Lutomirski wrote:
> On Mon, Aug 25, 2014 at 9:41 AM, Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
> > Le 25/08/2014 18:13, Andy Lutomirski a écrit :
> >
> >> On Mon, Aug 25, 2014 at 8:43 AM, Nicolas Dichtel
> >> <nicolas.dichtel@6wind.com> wrote:
> >>>
> >>> Le 25/08/2014 16:04, Andy Lutomirski a écrit :
> >>>
> >>>> On Aug 25, 2014 6:30 AM, "Nicolas Dichtel" <nicolas.dichtel@6wind.com>
> >>>> wrote:
> >>>>>>
> >>>>>>
> >>>>>> CRIU wants to save the complete state of a namespace and then restore
> >>>>>> it.  For that to work, any information exposed to things in the
> >>>>>> namespace *cannot* be globally unique or unique per boot, since CRIU
> >>>>>> needs to arrange for that information to match whatever it was when
> >>>>>> CRIU saved it.
> >>>>>
> >>>>>
> >>>>>
> >>>>> How are ifindex of network devices managed? These ifindexes are unique
> >>>>> per boot,
> >>>>> thus can change depending on the order in which netdev are created.
> >>>>> These ifindexes are unique per boot and exposed to userspace ...
> >>>>>
> >>>>
> >>>> This does not appear to be true.
> >>>>
> >>>> $ sudo unshare --net
> >>>> # ip link add veth0 type veth peer name veth1
> >>>> # ip link
> >>>> 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group
> >>>> default
> >>>>       link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >>>> 2: veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
> >>>> DEFAULT group default qlen 1000
> >>>>       link/ether 06:0d:59:c7:a6:a8 brd ff:ff:ff:ff:ff:ff
> >>>> 3: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
> >>>> DEFAULT group default qlen 1000
> >>>>       link/ether b2:5c:8b:f2:12:28 brd ff:ff:ff:ff:ff:ff
> >>>> # logout
> >>>> $ ip link
> >>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> >>>>       link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >>>> 3: em1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
> >>>> state DOWN qlen 1000
> >>>>
> >>> I've probably misunderstood what you're trying to say. ifindexes are
> >>> unique
> >>> per
> >>> boot and per netns.
> >>
> >>
> >> I think we both misunderstood each other.  The ifindexes are unique
> >> *per netns*, which means that, if you're unprivileged in a netns,
> >> global information doesn't leak to you.  I think this is good.
> >
> > Ok, I agree. I think audit daemons are always running under privileged
> > users.
> >
> >
> >>
> >>>>
> >>>> Let me try again, with emphasis in the right place.
> >>>>
> >>>> I think that *code running in a namespace* has no business even
> >>>> knowing a unique identity of *that namespace* from the perspective of
> >>>> the host.
> >>>>
> >>>> In your example, if there's a veth device between netns A and netns B,
> >>>> then code *in netns A* has no business knowing the identity of its
> >>>> veth peer if its peer (B) is a sibling or ancestor.  It also IMO has
> >>>> no business knowing the identity of its own netns (A) other than as
> >>>> "my netns".
> >>>
> >>>
> >>> I do not agree (see the example below).
> >>>
> >>>
> >>>>
> >>>> If A and B are siblings, then their parent needs to know where that
> >>>> veth device goes, but I think this is already the case to a sufficient
> >>>> extent today.
> >>>
> >>>
> >>> I'm not aware of a hierarchy between netns. A daemon should be able to
> >>> got the full network configuration, even if it's started when this
> >>> configuration
> >>> is already applied, ie even if it doesn't know what happen before it
> >>> starts.
> >>>
> >>
> >> I don't know exactly which namespaces have an explicit hierarchy, but
> >> there is certainly a hierarchy of *user* namespaces, and network
> >> namespaces live in user namespaces, so they at least have somewhat of
> >> a hierarchy.
> >>
> >>>
> >>>>
> >>>> I feel like this discussion is falling into a common trap of new API
> >>>> discussions.  Can one of you who wants this API please articulate,
> >>>> with a reasonably precise example, what it is that you want to do, why
> >>>> you can't easily do it already, and how this API helps?  I currently
> >>>> understand how the API creates problems, but I don't understand how it
> >>>> solves any problems, and I will NAK it (and I suspect that Eric will,
> >>>> too, which is pretty much fatal) unless that changes.
> >>>
> >>>
> >>> What I'm trying to solve is to have full info in netlink messages sent by
> >>> the
> >>> kernel, thus beeing able to identify a peer netns (and this is close from
> >>> what
> >>> audit guys are trying to have). Theorically, messages sent by the kernel
> >>> can
> >>> be
> >>> reused as is to have the same configuration. This is not the case with
> >>> x-netns
> >>> devices. Here is an example, with ip tunnels:
> >>>
> >>> $ ip netns add 1
> >>> $ ip link add ipip1 type ipip remote 10.16.0.121 local 10.16.0.249 dev
> >>> eth0
> >>> $ ip -d link ls ipip1
> >>> 8: ipip1@eth0: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode
> >>> DEFAULT group default
> >>>      link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0
> >>>      ipip remote 10.16.0.121 local 10.16.0.249 dev eth0 ttl inherit
> >>> pmtudisc
> >>> $ ip link set ipip1 netns 1
> >>> $ ip netns exec 1 ip -d link ls ipip1
> >>> 8: ipip1@tunl0: <POINTOPOINT,NOARP,M-DOWN> mtu 1480 qdisc noop state DOWN
> >>> mode DEFAULT group default
> >>>      link/ipip 10.16.0.249 peer 10.16.0.121 promiscuity 0
> >>>      ipip remote 10.16.0.121 local 10.16.0.249 dev tunl0 ttl inherit
> >>> pmtudisc
> >>>
> >>> Now informations got with 'ip link' are wrong and incomplete:
> >>>   - the link dev is now tunl0 instead of eth0, because we only got an
> >>> ifindex
> >>>     from the kernel without any netns informations.
> >>>   - the encapsulation addresses are not part of this netns but the user
> >>> doesn't
> >>>     known that (still because netns info is missing). These IPv4
> >>> addresses
> >>> may
> >>>     exist into this netns.
> >>>   - it's not possible to create the same netdevice with these infos.
> >>>
> >>
> >> Aha.  That's a genuine problem.
> >>
> >> Perhaps we need a concept of which netnses should be able to see each
> >> other.
> >
> > Yes, I agree. This is not required for all netns, only a subset of netns
> > should
> >
> > be able to see each other.
> >
> >>
> >> I think I would be okay with a somewhat different outcome from your
> >> example:
> >>
> >> $ ip netns exec 1 ip -d link ls ipip1
> >> 8: ipip1@[unknown device in another namespace]:
> >> <POINTOPOINT,NOARP,M-DOWN> mtu 1480 qdisc noop state DOWN
> >>
> >> I think this outcome is mandatory if netns 1 lives in a subsidiary
> >> user namespace.
> >
> > Yes.
> >
> >
> >>
> >> Certainly, if you do the 'ip link' in the original namespace, I agree
> >> that this should work.
> >
> > And yes :)
> >
> > I will update my previous proposal
> > (http://thread.gmane.org/gmane.linux.network/315933/focus=321753)
> > to allow to get an id for a peer netns only when the user namespace is the
> > same.
> 
> I think it should work if the peer userns is the same or a descendent.
> I also wonder whether the peer's ifindex should be suppressed if peer
> userns is not the same or a descendent.
> 
> Now you just have to get Eric to be happy with the id allocation. :)
> This may be nontrivial.
> 
> >> For most namespace types, this all works transparently, since
> >> everything has an real identity all the way up the hierarchy.  Network
> >> namespaces are different.
> >>
> >> I don't think that exposing serial numbers in /proc is a good
> >> solution, both for the reasons already described and because I don't
> >> think that iproute2 should need to muck around with /proc to function
> >
> > A netlink API is probably enough. But it will help only for the network
> > problem, not for audit. I was hoping to find a common solution.
> 
> I still don't understand why audit needs anything beyond the audit
> part of this patch set.  I have no problem with audit seeing that
> migrated/restored namespaces are really brand-new namespaces, as long
> as the code in those namespaces isn't exposed to it.

Ok, I'm starting to get this...  Perhaps /proc wasn't the best place to
expose this.  Audit or an audit aggregator is the only one that needs to
know any of this information.  This could be accomplished with
CAP_AUDIT_CONTROL and a new netlink audit message type to fetch
individual or all namespace IDs for a particular PID via auditctl, or by
having a CAP_AUDIT_WRITE-capable application pull the trigger to simply
dump that information to the log.

> >> correctly.  Eric, any clever ideas here?  Do we need fancier netlink
> >> messages for this?
> >>
> >> --Andy
> 
> Andy Lutomirski

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-08-27 15:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1408581429.git.rgb@redhat.com>
     [not found] ` <cd6cd0622ce677b639afae18a69ff79c72490bab.1408581429.git.rgb@redhat.com>
     [not found]   ` <CALCETrUnzG1V8w+H9ctAJP+Hvo8LQax=dhLG4bBpBKmVi+C1cQ@mail.gmail.com>
     [not found]     ` <20140822015803.GG20529@madcap2.tricolour.ca>
     [not found]       ` <CALCETrUkFD0iNi1SV_6ypN5Kf4GYybT5tzjRjRQuLzT9iBnQAg@mail.gmail.com>
     [not found]         ` <CALCETrUkFD0iNi1SV_6ypN5Kf4GYybT5tzjRjRQuLzT9iBnQAg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-25 13:30           ` [PATCH V4 3/8] namespaces: expose ns instance serial numbers in proc Nicolas Dichtel
     [not found]             ` <53FB3A86.2060203-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
2014-08-25 14:04               ` Andy Lutomirski
     [not found]                 ` <CALCETrW1Lv0qeccMjNHSEzgtiaNN3NgJVR1dFjjR_dw5KVVnqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-25 15:43                   ` Nicolas Dichtel
     [not found]                     ` <53FB59A3.5030804-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
2014-08-25 16:13                       ` Andy Lutomirski
     [not found]                         ` <CALCETrWHrWhm89B5s=pLt_9eTx3ZF8ifA6y6CwknWaWU7dp=sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-25 16:41                           ` Nicolas Dichtel
     [not found]                             ` <53FB673F.8070200-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
2014-08-25 16:50                               ` Andy Lutomirski
2014-08-27 15:17                                 ` Richard Guy Briggs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).