All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: menglong8.dong@gmail.com
Cc: jack@suse.cz, axboe@kernel.dk, hare@suse.de,
	gregkh@linuxfoundation.org, tj@kernel.org,
	dong.menglong@zte.com.cn, song@kernel.org, neilb@suse.de,
	akpm@linux-foundation.org, wangkefeng.wang@huawei.com,
	f.fainelli@gmail.com, arnd@arndb.de, brho@google.com,
	linux@rasmusvillemoes.dk, mhiramat@kernel.org,
	rostedt@goodmis.org, keescook@chromium.org, vbabka@suse.cz,
	glider@google.com, pmladek@suse.com, chris@chrisdown.name,
	ebiederm@xmission.com, jojing64@gmail.com,
	linux-kernel@vger.kernel.org, palmerdabbelt@google.com,
	linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk
Subject: Re: [PATCH RESEND] init/initramfs.c: make initramfs support pivot_root
Date: Thu, 20 May 2021 21:41:11 +0000	[thread overview]
Message-ID: <20210520214111.GV4332@42.do-not-panic.com> (raw)
In-Reply-To: <20210520154244.20209-1-dong.menglong@zte.com.cn>

On Thu, May 20, 2021 at 11:42:44PM +0800, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <dong.menglong@zte.com.cn>
> 
> During the kernel initialization, the mount tree, which is used by the
> kernel, is created, and 'rootfs' is mounted as the root file system.
> 
> While using initramfs as the root file system, cpio file is unpacked
> into the rootfs. Thus, this rootfs is exactly what users see in user
> space, and some problems arose: this rootfs has no parent mount,
> which make it can't be umounted or pivot_root.
> 
> 'pivot_root' is used to change the rootfs and clean the old mountpoints,
> and it is essential for some users, such as docker. Docker use
> 'pivot_root' to change the root fs of a process if the current root
> fs is a block device of initrd. However, when it comes to initramfs,
> things is different: docker has to use 'chroot()' to change the root
> fs, as 'pivot_root()' is not supported in initramfs.

OK so this seems to be a long winded way of saying you can't efficiently
use containers today when using initramfs. And then you explain why.

> The usage of 'chroot()' to create root fs for a container introduced
> a lot problems.
> 
> First, 'chroot()' can't clean the old mountpoints which inherited
> from the host. It means that the mountpoints in host will have a
> duplicate in every container. Let's image that there are 100
> containers in host, and there will be 100 duplicates of every
> mountpoints, which makes resource release an issue. User may
> remove a USB after he (or she) umount it successfully in the
> host. However, the USB may still be mounted in containers, although
> it can't be seen by the 'mount' commond in the container. This
> means the USB is not released yet, and data may not write back.
> Therefore, data lose arise.
> 
> Second, net-namespace leak is another problem. The net-namespace
> of containers will be mounted in /var/run/docker/netns/ in host
> by dockerd. It means that the net-namespace of a container will
> be mounted in containers which are created after it. Things
> become worse now, as the net-namespace can't be remove after
> the destroy of that container, as it is still mounted in other
> containers. If users want to recreate that container, he will
> fail if a certain mac address is to be binded with the container,
> as it is not release yet.

That seems like a chicken and egg problem on Docker, in that...

> Maybe dockerd can umount the unnecessary mountpoints that inherited
> for the host before do 'chroot()',

Can't docker instead allow to create containers prior to creating
your local docker network namespace? Not that its a great solution,
but just worth noting.

> but that is not a graceful way.
> I think the best way is to make 'pivot_root()' support initramfs.
> 
> After this patch, initramfs is supported by 'pivot_root()' perfectly.
> I just create a new rootfs and mount it to the mount-tree before
> unpack cpio. Therefore, the rootfs used by users has a parent mount,
> and can use 'pivot_root()'.
> 
> What's more, after this patch, 'rootflags' in boot cmd is supported
> by initramfs. Therefore, users can set the size of tmpfs with
> 'rootflags=size=1024M'.
> 
> Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
> ---
>  init/do_mounts.c | 53 +++++++++++++++++++++++++++++++++++++++---------
>  init/do_mounts.h |  1 +
>  init/initramfs.c | 32 +++++++++++++++++++++++++++++
>  init/main.c      | 17 +++++++++++++++-
>  4 files changed, 92 insertions(+), 11 deletions(-)
> 
> diff --git a/init/do_mounts.c b/init/do_mounts.c
> index a78e44ee6adb..a156b0d28b43 100644
> --- a/init/do_mounts.c
> +++ b/init/do_mounts.c
> @@ -459,7 +459,7 @@ void __init mount_block_root(char *name, int flags)
>  out:
>  	put_page(page);
>  }
> - 
> +
>  #ifdef CONFIG_ROOT_NFS
>  
>  #define NFSROOT_TIMEOUT_MIN	5
> @@ -617,24 +617,57 @@ void __init prepare_namespace(void)
>  	init_chroot(".");
>  }
>  
> -static bool is_tmpfs;
> -static int rootfs_init_fs_context(struct fs_context *fc)
> +#ifdef CONFIG_TMPFS
> +static __init bool is_tmpfs_enabled(void)
> +{
> +	return (!root_fs_names || strstr(root_fs_names, "tmpfs")) &&
> +	       !saved_root_name[0];
> +}
> +#endif
> +
> +static __init bool is_ramfs_enabled(void)
>  {
> -	if (IS_ENABLED(CONFIG_TMPFS) && is_tmpfs)
> -		return shmem_init_fs_context(fc);
> +	return true;
> +}
> +
> +struct fs_user_root {
> +	bool (*enabled)(void);
> +	char *dev_name;
> +	char *fs_name;
> +};
>  
> -	return ramfs_init_fs_context(fc);
> +static struct fs_user_root user_roots[] __initdata = {
> +#ifdef CONFIG_TMPFS
> +	{.fs_name = "tmpfs", .enabled = is_tmpfs_enabled },
> +#endif
> +	{.fs_name = "ramfs", .enabled = is_ramfs_enabled }
> +};
> +static struct fs_user_root * __initdata user_root;
> +
> +int __init mount_user_root(void)
> +{
> +	return do_mount_root(user_root->dev_name,
> +			     user_root->fs_name,
> +			     root_mountflags,
> +			     root_mount_data);
>  }
>  
>  struct file_system_type rootfs_fs_type = {
>  	.name		= "rootfs",
> -	.init_fs_context = rootfs_init_fs_context,
> +	.init_fs_context = ramfs_init_fs_context,

Why is this always static now? Why is that its correct
now for init_mount_tree() always to use the ramfs context?

  Luis

  reply	other threads:[~2021-05-20 21:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-20 15:42 [PATCH RESEND] init/initramfs.c: make initramfs support pivot_root menglong8.dong
2021-05-20 21:41 ` Luis Chamberlain [this message]
2021-05-21  0:41   ` Menglong Dong
2021-05-21 15:50     ` Luis Chamberlain
2021-05-22  4:09       ` Menglong Dong
2021-05-24 22:58         ` Luis Chamberlain
2021-05-25  0:55           ` Menglong Dong
2021-05-25  1:43             ` Luis Chamberlain
2021-05-25  6:09               ` Menglong Dong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210520214111.GV4332@42.do-not-panic.com \
    --to=mcgrof@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=axboe@kernel.dk \
    --cc=brho@google.com \
    --cc=chris@chrisdown.name \
    --cc=dong.menglong@zte.com.cn \
    --cc=ebiederm@xmission.com \
    --cc=f.fainelli@gmail.com \
    --cc=glider@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hare@suse.de \
    --cc=jack@suse.cz \
    --cc=jojing64@gmail.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=menglong8.dong@gmail.com \
    --cc=mhiramat@kernel.org \
    --cc=neilb@suse.de \
    --cc=palmerdabbelt@google.com \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.