From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B28E3C47088 for ; Wed, 26 May 2021 03:23:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 86E976128D for ; Wed, 26 May 2021 03:23:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231659AbhEZDZ2 (ORCPT ); Tue, 25 May 2021 23:25:28 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:37108 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230409AbhEZDZ1 (ORCPT ); Tue, 25 May 2021 23:25:27 -0400 Received: from in01.mta.xmission.com ([166.70.13.51]) by out01.mta.xmission.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1llk8m-009FJV-6g; Tue, 25 May 2021 21:23:28 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=fess.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1llk8l-0005ks-42; Tue, 25 May 2021 21:23:27 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Menglong Dong Cc: Luis Chamberlain , Josh Triplett , Alexander Viro , Kees Cook , Sami Tolvanen , ojeda@kernel.org, johan@kernel.org, Bjorn Helgaas , masahiroy@kernel.org, Menglong Dong , joe@perches.com, Jens Axboe , hare@suse.de, Jan Kara , tj@kernel.org, gregkh@linuxfoundation.org, song@kernel.org, NeilBrown , Andrew Morton , f.fainelli@gmail.com, arnd@arndb.de, Rasmus Villemoes , wangkefeng.wang@huawei.com, Barret Rhoden , mhiramat@kernel.org, Steven Rostedt , vbabka@suse.cz, Alexander Potapenko , pmladek@suse.com, Chris Down , jojing64@gmail.com, terrelln@fb.com, geert@linux-m68k.org, mingo@kernel.org, linux-fsdevel@vger.kernel.org, LKML , jeyu@kernel.org References: <20210525141524.3995-1-dong.menglong@zte.com.cn> <20210525141524.3995-3-dong.menglong@zte.com.cn> Date: Tue, 25 May 2021 22:23:09 -0500 In-Reply-To: (Menglong Dong's message of "Wed, 26 May 2021 09:51:22 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1llk8l-0005ks-42;;;mid=;;;hst=in01.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX19UEqCr4Wgg2PYoJNFhlS45Ajf4X6WIQe0= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH v2 2/3] init/do_cmounts.c: introduce 'user_root' for initramfs X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Menglong Dong writes: > On Wed, May 26, 2021 at 2:50 AM Eric W. Biederman wrote: >> > ...... >> >> What is the flow where docker uses an initramfs? >> >> Just thinking about this I am not being able to connect the dots. >> >> The way I imagine the world is that an initramfs will be used either >> when a linux system boots for the first time, or an initramfs would >> come from the distribution you are running inside a container. In >> neither case do I see docker being in a position to add functionality >> to the initramfs as docker is not responsible for it. >> >> Is docker doing something creating like running a container in a VM, >> and running some directly out of the initramfs, and wanting that code >> to exactly match the non-VM case? >> >> If that is the case I think the easy solution would be to actually use >> an actual ramdisk where pivot_root works. > > In fact, nowadays, initramfs is widely used by embedded devices in the > production environment, which makes the whole system run in ram. > > That make sense. First, running in ram will speed up the system. The size > of the system won't be too large for embedded devices, which makes this > idea work. Second, this will reduce the I/O of disk devices, which can > extend the life of the disk. Third, RAM is getting cheaper. > > So in this scene, Docker runs directly in initramfs. That is the piece of the puzzle I was missing. An small system with it's root in an initramfs. >> I really don't see why it makes sense for docker to be a special >> snowflake and require kernel features that no other distribution does. >> >> It might make sense to create a completely empty filesystem underneath >> an initramfs, and use that new rootfs as the unchanging root of the >> mount tree, if it can be done with a trivial amount of code, and >> generally make everything cleaner. >> >> As this change sits it looks like a lot of code to handle a problem >> in the implementation of docker. Which quite frankly will be a pain >> to have to maintain if this is not a clean general feature that >> other people can also use. >> > > I don't think that it's all for docker, pivot_root may be used by other > users in the above scene. It may work to create an empty filesystem, as you > mentioned above. But I don't think it's a good idea to make all users, > who want to use pivot_root, do that. After all, it's not friendly to > users. > > As for the code, it may look a lot, but it's not complex. Maybe a clean > up for the code I add can make it better? If we are going to do this something that is so small and clean it can be done unconditionally always. I will see if I can dig in and look at little more. I think there is a reason Al Viro and H. Peter Anvin implemeted initramfs this way. Perhaps it was just a desire to make pivot_root unnecessary. Container filesystem setup does throw a bit of a wrench in the works as unlike a initramfs where you can just delete everything there is not a clean way to get rid of a root filesystem you don't need without pivot_root. The net request as I understand it: Make the filesystem the initramfs lives in be an ordinary filesystem so it can just be used as the systems primary filesystem. There might be technical reasons why that is a bad idea and userspace would be requested to move everything into another ramfs manually (which would have the same effect). But it is take a good look to see if it can be accomplished cleanly. Eric