public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	"Eric W. Biederman" <ebiederm@xmission.com>
Cc: David Ahern <dsahern@kernel.org>, Christian Brauner <brauner@kernel.org>
Subject: Re: Persisting mounts between 'ip netns' invocations
Date: Thu, 28 Sep 2023 11:54:23 +0200	[thread overview]
Message-ID: <2aa087b5-cbcf-e736-00d4-d962a9deda75@6wind.com> (raw)
In-Reply-To: <87a5t68zvw.fsf@toke.dk>

+ Eric

Le 28/09/2023 à 10:29, Toke Høiland-Jørgensen a écrit :
> Hi everyone
> 
> I recently ran into this problem again, and so I figured I'd ask if
> anyone has any good idea how to solve it:
> 
> When running a command through 'ip netns exec', iproute2 will
> "helpfully" create a new mount namespace and remount /sys inside it,
> AFAICT to make sure /sys/class/net/* refers to the right devices inside
> the namespace. This makes sense, but unfortunately it has the side
> effect that no mount commands executed inside the ns persist. In
> particular, this makes it difficult to work with bpffs; even when
> mounting a bpffs inside the ns, it will disappear along with the
> namespace as soon as the process exits.
> 
> To illustrate:
> 
> # ip netns exec <nsname> bpftool map pin id 2 /sys/fs/bpf/mymap
> # ip netns exec <nsname> ls /sys/fs/bpf
> <nothing>
> 
> This happens because namespaces are cleaned up as soon as they have no
> processes, unless they are persisted by some other means. For the
> network namespace itself, iproute2 will bind mount /proc/self/ns/net to
> /var/run/netns/<nsname> (in the root mount namespace) to persist the
> namespace. I tried implementing something similar for the mount
> namespace, but that doesn't work; I can't manually bind mount the 'mnt'
> ns reference either:
> 
> # mount -o bind /proc/104444/ns/mnt /var/run/netns/mnt/testns
> mount: /run/netns/mnt/testns: wrong fs type, bad option, bad superblock on /proc/104444/ns/mnt, missing codepage or helper program, or other error.
>        dmesg(1) may have more information after failed mount system call.
> 
> When running strace on that mount command, it seems the move_mount()
> syscall returns EINVAL, which, AFAICT, is because the mount namespace
> file references itself as its namespace, which means it can't be
> bind-mounted into the containing mount namespace.
> 
> So, my question is, how to overcome this limitation? I know it's
> possible to get a reference to the namespace of a running process, but
> there is no guarantee there is any processes running inside the
> namespace (hence the persisting bind mount for the netns). So is there
> some other way to persist the mount namespace reference, so we can pick
> it back up on the next 'ip netns' invocation?
> 
> Hoping someone has a good idea :)
We ran into similar problems. The only solution we found was to use nsenter
instead of 'ip netns exec'.

To be able to bind mount a mount namespace on a file, the directory of this file
should be private. For example:

mkdir -p /run/foo
mount --make-rshared /
mount --bind /run/foo /run/foo
mount --make-private /run/foo
touch /run/foo/ns
unshare --mount --propagation=slave -- sh -c 'yes $$ 2>/dev/null' | {
        read -r pid &&
        mount --bind /proc/$pid/ns/mnt /run/foo/ns
}
nsenter --mount=/run/foo/ns ls /

But this doesn't work under 'ip netns exec'.


Regards,
Nicolas

  reply	other threads:[~2023-09-28  9:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-28  8:29 Persisting mounts between 'ip netns' invocations Toke Høiland-Jørgensen
2023-09-28  9:54 ` Nicolas Dichtel [this message]
2023-09-28 16:17   ` Christian Brauner
2023-09-28 18:21     ` Toke Høiland-Jørgensen
2023-09-29  8:26       ` Nicolas Dichtel
2023-09-29  9:25         ` Christian Brauner
2023-09-29  9:45           ` Nicolas Dichtel
2023-09-29 21:23             ` David Laight
2023-09-29 15:00   ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2aa087b5-cbcf-e736-00d4-d962a9deda75@6wind.com \
    --to=nicolas.dichtel@6wind.com \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=dsahern@kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox