linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: menglong8.dong@gmail.com
Cc: jack@suse.cz, axboe@kernel.dk, hare@suse.de,
	gregkh@linuxfoundation.org, tj@kernel.org,
	dong.menglong@zte.com.cn, song@kernel.org, neilb@suse.de,
	akpm@linux-foundation.org, wangkefeng.wang@huawei.com,
	f.fainelli@gmail.com, arnd@arndb.de, brho@google.com,
	linux@rasmusvillemoes.dk, mhiramat@kernel.org,
	rostedt@goodmis.org, keescook@chromium.org, vbabka@suse.cz,
	glider@google.com, pmladek@suse.com, chris@chrisdown.name,
	ebiederm@xmission.com, jojing64@gmail.com,
	linux-kernel@vger.kernel.org, palmerdabbelt@google.com,
	linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk
Subject: Re: [PATCH RESEND] init/initramfs.c: make initramfs support pivot_root
Date: Thu, 20 May 2021 21:41:11 +0000	[thread overview]
Message-ID: <20210520214111.GV4332@42.do-not-panic.com> (raw)
In-Reply-To: <20210520154244.20209-1-dong.menglong@zte.com.cn>

On Thu, May 20, 2021 at 11:42:44PM +0800, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <dong.menglong@zte.com.cn>
> 
> During the kernel initialization, the mount tree, which is used by the
> kernel, is created, and 'rootfs' is mounted as the root file system.
> 
> While using initramfs as the root file system, cpio file is unpacked
> into the rootfs. Thus, this rootfs is exactly what users see in user
> space, and some problems arose: this rootfs has no parent mount,
> which make it can't be umounted or pivot_root.
> 
> 'pivot_root' is used to change the rootfs and clean the old mountpoints,
> and it is essential for some users, such as docker. Docker use
> 'pivot_root' to change the root fs of a process if the current root
> fs is a block device of initrd. However, when it comes to initramfs,
> things is different: docker has to use 'chroot()' to change the root
> fs, as 'pivot_root()' is not supported in initramfs.

OK so this seems to be a long winded way of saying you can't efficiently
use containers today when using initramfs. And then you explain why.

> The usage of 'chroot()' to create root fs for a container introduced
> a lot problems.
> 
> First, 'chroot()' can't clean the old mountpoints which inherited
> from the host. It means that the mountpoints in host will have a
> duplicate in every container. Let's image that there are 100
> containers in host, and there will be 100 duplicates of every
> mountpoints, which makes resource release an issue. User may
> remove a USB after he (or she) umount it successfully in the
> host. However, the USB may still be mounted in containers, although
> it can't be seen by the 'mount' commond in the container. This
> means the USB is not released yet, and data may not write back.
> Therefore, data lose arise.
> 
> Second, net-namespace leak is another problem. The net-namespace
> of containers will be mounted in /var/run/docker/netns/ in host
> by dockerd. It means that the net-namespace of a container will
> be mounted in containers which are created after it. Things
> become worse now, as the net-namespace can't be remove after
> the destroy of that container, as it is still mounted in other
> containers. If users want to recreate that container, he will
> fail if a certain mac address is to be binded with the container,
> as it is not release yet.

That seems like a chicken and egg problem on Docker, in that...

> Maybe dockerd can umount the unnecessary mountpoints that inherited
> for the host before do 'chroot()',

Can't docker instead allow to create containers prior to creating
your local docker network namespace? Not that its a great solution,
but just worth noting.

> but that is not a graceful way.
> I think the best way is to make 'pivot_root()' support initramfs.
> 
> After this patch, initramfs is supported by 'pivot_root()' perfectly.
> I just create a new rootfs and mount it to the mount-tree before
> unpack cpio. Therefore, the rootfs used by users has a parent mount,
> and can use 'pivot_root()'.
> 
> What's more, after this patch, 'rootflags' in boot cmd is supported
> by initramfs. Therefore, users can set the size of tmpfs with
> 'rootflags=size=1024M'.
> 
> Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
> ---
>  init/do_mounts.c | 53 +++++++++++++++++++++++++++++++++++++++---------
>  init/do_mounts.h |  1 +
>  init/initramfs.c | 32 +++++++++++++++++++++++++++++
>  init/main.c      | 17 +++++++++++++++-
>  4 files changed, 92 insertions(+), 11 deletions(-)
> 
> diff --git a/init/do_mounts.c b/init/do_mounts.c
> index a78e44ee6adb..a156b0d28b43 100644
> --- a/init/do_mounts.c
> +++ b/init/do_mounts.c
> @@ -459,7 +459,7 @@ void __init mount_block_root(char *name, int flags)
>  out:
>  	put_page(page);
>  }
> - 
> +
>  #ifdef CONFIG_ROOT_NFS
>  
>  #define NFSROOT_TIMEOUT_MIN	5
> @@ -617,24 +617,57 @@ void __init prepare_namespace(void)
>  	init_chroot(".");
>  }
>  
> -static bool is_tmpfs;
> -static int rootfs_init_fs_context(struct fs_context *fc)
> +#ifdef CONFIG_TMPFS
> +static __init bool is_tmpfs_enabled(void)
> +{
> +	return (!root_fs_names || strstr(root_fs_names, "tmpfs")) &&
> +	       !saved_root_name[0];
> +}
> +#endif
> +
> +static __init bool is_ramfs_enabled(void)
>  {
> -	if (IS_ENABLED(CONFIG_TMPFS) && is_tmpfs)
> -		return shmem_init_fs_context(fc);
> +	return true;
> +}
> +
> +struct fs_user_root {
> +	bool (*enabled)(void);
> +	char *dev_name;
> +	char *fs_name;
> +};
>  
> -	return ramfs_init_fs_context(fc);
> +static struct fs_user_root user_roots[] __initdata = {
> +#ifdef CONFIG_TMPFS
> +	{.fs_name = "tmpfs", .enabled = is_tmpfs_enabled },
> +#endif
> +	{.fs_name = "ramfs", .enabled = is_ramfs_enabled }
> +};
> +static struct fs_user_root * __initdata user_root;
> +
> +int __init mount_user_root(void)
> +{
> +	return do_mount_root(user_root->dev_name,
> +			     user_root->fs_name,
> +			     root_mountflags,
> +			     root_mount_data);
>  }
>  
>  struct file_system_type rootfs_fs_type = {
>  	.name		= "rootfs",
> -	.init_fs_context = rootfs_init_fs_context,
> +	.init_fs_context = ramfs_init_fs_context,

Why is this always static now? Why is that its correct
now for init_mount_tree() always to use the ramfs context?

  Luis

  reply	other threads:[~2021-05-20 21:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-20 15:42 [PATCH RESEND] init/initramfs.c: make initramfs support pivot_root menglong8.dong
2021-05-20 21:41 ` Luis Chamberlain [this message]
2021-05-21  0:41   ` Menglong Dong
2021-05-21 15:50     ` Luis Chamberlain
2021-05-22  4:09       ` Menglong Dong
2021-05-24 22:58         ` Luis Chamberlain
2021-05-25  0:55           ` Menglong Dong
2021-05-25  1:43             ` Luis Chamberlain
2021-05-25  6:09               ` Menglong Dong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210520214111.GV4332@42.do-not-panic.com \
    --to=mcgrof@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=axboe@kernel.dk \
    --cc=brho@google.com \
    --cc=chris@chrisdown.name \
    --cc=dong.menglong@zte.com.cn \
    --cc=ebiederm@xmission.com \
    --cc=f.fainelli@gmail.com \
    --cc=glider@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hare@suse.de \
    --cc=jack@suse.cz \
    --cc=jojing64@gmail.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=menglong8.dong@gmail.com \
    --cc=mhiramat@kernel.org \
    --cc=neilb@suse.de \
    --cc=palmerdabbelt@google.com \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).