From: ebiederm@xmission.com (Eric W. Biederman)
To: menglong8.dong@gmail.com
Cc: mcgrof@kernel.org, josh@joshtriplett.org,
viro@zeniv.linux.org.uk, keescook@chromium.org,
samitolvanen@google.com, ojeda@kernel.org, johan@kernel.org,
bhelgaas@google.com, masahiroy@kernel.org,
dong.menglong@zte.com.cn, joe@perches.com, axboe@kernel.dk,
hare@suse.de, jack@suse.cz, tj@kernel.org,
gregkh@linuxfoundation.org, song@kernel.org, neilb@suse.de,
akpm@linux-foundation.org, f.fainelli@gmail.com, arnd@arndb.de,
linux@rasmusvillemoes.dk, wangkefeng.wang@huawei.com,
brho@google.com, mhiramat@kernel.org, rostedt@goodmis.org,
vbabka@suse.cz, glider@google.com, pmladek@suse.com,
chris@chrisdown.name, jojing64@gmail.com, terrelln@fb.com,
geert@linux-m68k.org, mingo@kernel.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
jeyu@kernel.org
Subject: Re: [PATCH v2 2/3] init/do_cmounts.c: introduce 'user_root' for initramfs
Date: Tue, 25 May 2021 13:49:48 -0500 [thread overview]
Message-ID: <m18s42odgz.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <20210525141524.3995-3-dong.menglong@zte.com.cn> (menglong8's message of "Tue, 25 May 2021 22:15:23 +0800")
menglong8.dong@gmail.com writes:
> From: Menglong Dong <dong.menglong@zte.com.cn>
>
> If using container platforms such as Docker, upon initialization it
> wants to use pivot_root() so that currently mounted devices do not
> propagate to containers. An example of value in this is that
> a USB device connected prior to the creation of a containers on the
> host gets disconnected after a container is created; if the
> USB device was mounted on containers, but already removed and
> umounted on the host, the mount point will not go away until all
> containers unmount the USB device.
>
> Another reason for container platforms such as Docker to use pivot_root
> is that upon initialization the net-namspace is mounted under
> /var/run/docker/netns/ on the host by dockerd. Without pivot_root
> Docker must either wait to create the network namespace prior to
> the creation of containers or simply deal with leaking this to each
> container.
>
> pivot_root is supported if the rootfs is a initrd or block device, but
> it's not supported if the rootfs uses an initramfs (tmpfs). This means
> container platforms today must resort to using block devices if
> they want to pivot_root from the rootfs. A workaround to use chroot()
> is not a clean viable option given every container will have a
> duplicate of every mount point on the host.
>
> In order to support using container platforms such as Docker on
> all the supported rootfs types we must extend Linux to support
> pivot_root on initramfs as well. This patch does the work to do
> just that.
>
> pivot_root will unmount the mount of the rootfs from its parent mount
> and mount the new root to it. However, when it comes to initramfs, it
> donesn't work, because the root filesystem has not parent mount, which
> makes initramfs not supported by pivot_root.
>
> In order to support pivot_root on initramfs we introduce a second
> "user_root" mount which is created before we do the cpio unpacking.
> The filesystem of the "user_root" mount is the same the rootfs.
>
> While mounting the 'user_root', 'rootflags' is passed to it, and it means
> that we can set options for the mount of rootfs in boot cmd now.
> For example, the size of tmpfs can be set with 'rootflags=size=1024M'.
What is the flow where docker uses an initramfs?
Just thinking about this I am not being able to connect the dots.
The way I imagine the world is that an initramfs will be used either
when a linux system boots for the first time, or an initramfs would
come from the distribution you are running inside a container. In
neither case do I see docker being in a position to add functionality
to the initramfs as docker is not responsible for it.
Is docker doing something creating like running a container in a VM,
and running some directly out of the initramfs, and wanting that code
to exactly match the non-VM case?
If that is the case I think the easy solution would be to actually use
an actual ramdisk where pivot_root works.
I really don't see why it makes sense for docker to be a special
snowflake and require kernel features that no other distribution does.
It might make sense to create a completely empty filesystem underneath
an initramfs, and use that new rootfs as the unchanging root of the
mount tree, if it can be done with a trivial amount of code, and
generally make everything cleaner.
As this change sits it looks like a lot of code to handle a problem
in the implementation of docker. Which quite frankly will be a pain
to have to maintain if this is not a clean general feature that
other people can also use.
Eric
next prev parent reply other threads:[~2021-05-25 18:50 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-25 14:15 [PATCH v2 0/3] init/initramfs.c: make initramfs support pivot_root menglong8.dong
2021-05-25 14:15 ` [PATCH v2 1/3] init/main.c: introduce function ramdisk_exec_exist() menglong8.dong
2021-05-25 14:15 ` [PATCH v2 2/3] init/do_cmounts.c: introduce 'user_root' for initramfs menglong8.dong
2021-05-25 18:49 ` Eric W. Biederman [this message]
2021-05-26 1:51 ` Menglong Dong
2021-05-26 3:23 ` Eric W. Biederman
2021-05-26 4:32 ` Josh Triplett
2021-05-26 8:33 ` Menglong Dong
2021-05-26 9:03 ` Luis Chamberlain
2021-05-27 7:29 ` Menglong Dong
2021-05-26 22:33 ` Josh Triplett
2021-05-26 8:23 ` Menglong Dong
2021-05-25 14:15 ` [PATCH v2 3/3] init/do_mounts.c: fix rootfs_fs_type with ramfs menglong8.dong
2021-05-25 17:43 ` [PATCH v2 0/3] init/initramfs.c: make initramfs support pivot_root Josh Triplett
2021-05-28 7:10 ` Masami Hiramatsu
2021-05-28 7:37 ` Menglong Dong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m18s42odgz.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=axboe@kernel.dk \
--cc=bhelgaas@google.com \
--cc=brho@google.com \
--cc=chris@chrisdown.name \
--cc=dong.menglong@zte.com.cn \
--cc=f.fainelli@gmail.com \
--cc=geert@linux-m68k.org \
--cc=glider@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=hare@suse.de \
--cc=jack@suse.cz \
--cc=jeyu@kernel.org \
--cc=joe@perches.com \
--cc=johan@kernel.org \
--cc=jojing64@gmail.com \
--cc=josh@joshtriplett.org \
--cc=keescook@chromium.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=masahiroy@kernel.org \
--cc=mcgrof@kernel.org \
--cc=menglong8.dong@gmail.com \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=neilb@suse.de \
--cc=ojeda@kernel.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=samitolvanen@google.com \
--cc=song@kernel.org \
--cc=terrelln@fb.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.