From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: David Ahern <dsahern@gmail.com>,
Stephen Hemminger <stephen@networkplumber.org>,
netdev@vger.kernel.org,
Nicolas Dichtel <nicolas.dichtel@6wind.com>,
Christian Brauner <brauner@kernel.org>,
David Laight <David.Laight@ACULAB.COM>
Subject: Re: [RFC PATCH iproute2-next 0/5] Persisting of mount namespaces along with network namespaces
Date: Wed, 11 Oct 2023 17:03:54 +0200 [thread overview]
Message-ID: <87lec9xkth.fsf@toke.dk> (raw)
In-Reply-To: <871qe1i4z7.fsf@email.froward.int.ebiederm.org>
"Eric W. Biederman" <ebiederm@xmission.com> writes:
> Toke Høiland-Jørgensen <toke@redhat.com> writes:
>
>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>>
>>> Toke Høiland-Jørgensen <toke@redhat.com> writes:
>>>
>>>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>>>>
>>>>> Toke Høiland-Jørgensen <toke@redhat.com> writes:
>>>>>
>>>>>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>>>>>>
>>>>>>> Toke Høiland-Jørgensen <toke@redhat.com> writes:
>
>>> My proposal:
>>>
>>> On "ip netns add NAME"
>>> - create the network namespace and mount it at /run/netns/NAME
>>> - mount the appropriate sysfs at /run/netns-mounts/NAME/sys
>>> - mount the appropriate bpffs at /run/netns-mounts/NAME/sys/fs/bpf
>>>
>>> On "ip netns delete NAME"
>>> - umount --recursive /run/netns-mounts/NAME
>>> - unlink /run/netns-mounts/NAME
>>> - cleanup /run/netns/NAME as we do today.
>>>
>>> On "ip netns exec NAME"
>>> - Walk through /run/netns-mounts/NAME like we do /etc/netns/NAME/
>>> and perform bind mounts.
>>
>> If we setup the full /sys hierarchy in /run/netns-mounts/NAME this
>> basically becomes a single recursive bind mount, doesn't it?
>
> Yes.
>
>> What about if we also include bind mounts from the host namespace into
>> that separate /sys instance? Will those be included into a recursive
>> bind into /sys inside the mount-ns, or will we have to walk the tree and
>> do separate bind mounts for each directory?
>
> if /run/netns-mounts/NAME/sys has everything you want.
>
> mount --rbind /run/netns-mounts/NAME/sys /sys
>
> Will result in a /sys that has everything you want.
>
>> Anyway, this scheme sounds like it'll solve the issue I was trying to
>> address so I don't mind doing it this way. I'll try it out and respin
>> the patch series.
>
> Thanks that sounds like a way forward.
>
>
>>>>> Mount propagation is a way to configure a mount namespace (before
>>>>> creating a new one) that will cause mounts created in the first mount
>>>>> namespace to be created in it's children, and cause mounts created in
>>>>> the children to be created in the parent (depending on how things are
>>>>> configured).
>>>>>
>>>>> It is not my favorite feature (it makes locking of mount namespaces
>>>>> terrible) and it is probably too clever by half, unfortunately systemd
>>>>> started enabling mount propagation by default, so we are stuck with it.
>>>>
>>>> Right. AFAICT the current iproute2 code explicitly tries to avoid that
>>>> when creating a mountns (it does a 'mount --make-rslave /'); so you're
>>>> saying we should change that?
>>>
>>> If it makes sense.
>>>
>>> I believe I added the 'mount --make-rslave /' because otherwise all
>>> mount activity was propagating back, and making a mess. Especially when
>>> I was unmounting /sys.
>>>
>>> I am not a huge fan of mount propagation it has lots of surprising
>>> little details that need to be set just right, to not cause problems.
>>
>> Ah, you were talking about propagation from inside the mountns to
>> outside? Didn't catch that at first...
>>
>>> With my proposal above I think we could in some carefully chosen
>>> places enable mount propagation without problem.
>>
>> One thing that comes to mind would be that if we create persistent /sys
>> instances in /run/netns-mounts per the above, it would make sense for
>> any modifications done inside the netns to be propagated back to the
>> mount in /run; is this possible with a bind mount? Not sure I quite
>> understand how propagation would work in this case (since it would be a
>> separate (bind) mount point inside the namespace).
>
> Basically yes, but the challenge is in the details.
>
> If the initial propagation is setup properly it will work. The
> weirdness is how propagation works. There is a weird detail that
> it needs to be setup on the parent and not on the mount point.
>
> I think the formula is something like:
>
> mount --bind /run/netns-mounts/NAME/sys/ /run/netns-mounts/NAME/sys/
> mount --make-rshared /run/netns-mounts/NAME/sys/
> mount -t sysfs /run/netns-mounts/NAME/sys
>
> My memory is that systemd by default does
>
> mount --make-rshared /
>
> So the challenge may be to simply limit what is propagated to a
> controlled subset.
Alright, I'll play around with it and bug you some more if I can't get
it to work properly; thanks for the pointers! :)
-Toke
next prev parent reply other threads:[~2023-10-11 15:04 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-09 18:27 [RFC PATCH iproute2-next 0/5] Persisting of mount namespaces along with network namespaces Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 1/5] ip: Mount netns in child process instead of from inside the new namespace Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 2/5] ip: Split out code creating namespace mount dir so it can be reused Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 3/5] lib/namespace: Factor out code for reuse Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 4/5] ip: Also create and persist mount namespace when creating netns Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 5/5] lib/namespace: Also mount a bpffs instance inside new mount namespaces Toke Høiland-Jørgensen
2023-10-09 20:32 ` [RFC PATCH iproute2-next 0/5] Persisting of mount namespaces along with network namespaces Eric W. Biederman
2023-10-09 22:03 ` Toke Høiland-Jørgensen
2023-10-10 0:14 ` Eric W. Biederman
2023-10-10 13:38 ` Toke Høiland-Jørgensen
2023-10-10 19:19 ` Eric W. Biederman
2023-10-11 13:49 ` Toke Høiland-Jørgensen
2023-10-11 14:55 ` Eric W. Biederman
2023-10-11 15:03 ` Toke Høiland-Jørgensen [this message]
2023-10-10 8:42 ` David Laight
2023-10-10 19:32 ` Eric W. Biederman
2023-10-10 21:51 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lec9xkth.fsf@toke.dk \
--to=toke@redhat.com \
--cc=David.Laight@ACULAB.COM \
--cc=brauner@kernel.org \
--cc=dsahern@gmail.com \
--cc=ebiederm@xmission.com \
--cc=netdev@vger.kernel.org \
--cc=nicolas.dichtel@6wind.com \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.