From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: Virtualizing /proc/sys/kernel/random/boot_id per container ? Date: Wed, 5 Sep 2012 11:59:18 +0400 Message-ID: <50470656.8080300@parallels.com> References: <20120830211832.GA3297@redhat.com> <878vcwjabu.fsf@xmission.com> <20120830225002.GA9226@redhat.com> <87bohrhqai.fsf@xmission.com> <5044629C.3030909@parallels.com> <87r4qi6g6k.fsf@xmission.com> <20120904144428.GB14093@amd1> <50461421.7030305@parallels.com> <20120904152526.GB19564@amd1> <50461EBB.2050501@parallels.com> <20120904171818.GA5334@mail.hallyn.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120904171818.GA5334-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: "Serge E. Hallyn" Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, "Eric W. Biederman" List-Id: containers.vger.kernel.org On 09/04/2012 09:18 PM, Serge E. Hallyn wrote: > Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org): >> On 09/04/2012 07:25 PM, Serge Hallyn wrote: >>> Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org): >>>> On 09/04/2012 06:44 PM, Serge Hallyn wrote: >>>>> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): >>>>>> Glauber Costa writes: >>>>>> >>>>>>> On 08/31/2012 04:13 AM, Eric W. Biederman wrote: >>>>>>>> "Daniel P. Berrange" writes: >>>>>>>> >>>>>>>>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote: >>>>>>>>>> "Daniel P. Berrange" writes: >>>>>>>>>> >>>>>>>>>>> One of the features that SystemD folks have asked us to fix in LXC, is >>>>>>>>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a >>>>>>>>>>> container is started. >>>>>>>>>> >>>>>>>>>> There may be a good reason for this. Most of the time what I have seen >>>>>>>>>> of kernel requests from the direction of SystemD is that while there may >>>>>>>>>> be a real problem but usually their imagined solution is not a >>>>>>>>>> particularly good solution. So a description of the problem is needed. >>>>>>>>>> >>>>>>>>>> Justifying something with just SystemD wants this is a good way to get >>>>>>>>>> a nack. >>>>>>>>> >>>>>>>>> SystemD records log messages for all system services in their journal. >>>>>>>>> They can show you all log messages for the current service execution, >>>>>>>>> all log messages for a service since system boot, or all log messsages >>>>>>>>> ever. The boot_id value is used as a unique tag to allow grouping of >>>>>>>>> the log messages per system boot. When we run systemd inside a container >>>>>>>>> we want to get that grouping of log messages generated by services inside >>>>>>>>> the container, to take account of the container boot, not the host boot. >>>>>>>>> Hence the desire to have the boot_id value reflect when a container is >>>>>>>>> booted. >>>>>>>> >>>>>>>> Since SystemD post-dates containers and since the logging feature is not >>>>>>>> currently in wide use that use case is completely non-persuasive. >>>>>>>> >>>>>>>> So far this just sounds like a plain SystemD bug and something that can >>>>>>>> be easily changed at this point in time. >>>>>>>> >>>>>>>> It has been a long time but my fuzzy memory says that the originial >>>>>>>> boot_id justification was based on use cases that could not be solved >>>>>>>> any other way. >>>>>>>> >>>>>>>> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233 >>>>>>>> that inspired the implementation of boot_id. However reading the >>>>>>>> current emacs source code it appears emacs gave up before boot_id >>>>>>>> was implemented and stats /var/run/random-seed (which we seem to >>>>>>>> have removed) or looks in wtmp or utmp for the latest boot record. >>>>>>>> >>>>>>>> I did a quick grep through the binaries on my system and I could not >>>>>>>> find anything using /proc/sys/random/boot_id. >>>>>>>> >>>>>>>> That suggests to me that the proper solution is to actually just remove >>>>>>>> boot_id. >>>>>>>> >>>>>>>> Hmm. And then there is other interesting detail. What should boot_id >>>>>>>> return after the processes have migrated from one system to another. >>>>>>>> >>>>>>> >>>>>>> Since this would be a per-boot id, this clearly has to be carried over >>>>>>> with migration, along with all the tons of data we already carry. >>>>>> >>>>>> The twist of course is what does a boot mean. If we are really after >>>>>> machine boots than the current behavior is correct. >>>>>> >>>>>> Looking back in the archives the desired behavior appears to be a value >>>>>> that can be used to see if a pid value must be stale. >>>>>> >>>>>> As a stale pid detector boot_id is pretty lousy. Pids can still be >>>>>> reused. >>>>>> >>>>>> Still a role as a stale pid detector makes it clear which namespace >>>>>> boot_id should be in and how we should treat boot_id upon migration. >>>>>> >>>>>> You can only serve as a stale pid detector if you are in the pid >>>>>> namespace. >>>>>> >>>>>> So at this point patches are welcome. Hopefully with a summary >>>>>> of the discussion. >>>>> >>>>> I don't understand why this should be provided by the kernel. Especially >>>>> given that we've proven that everyone really wants this to be per-container >>>>> as well. >>>>> >>>>> So why not just have init, on startup, create a /run/boot_id file, perhaps >>>>> by sha1summing the time at which it started perhaps plus some nonce? >>>>> >>>> Why shouldn't it provided by the kernel?, is the real question >>> >>> Because it's not the right place. The origin of this thread proves that >>> people want a per-init, not per-kernel, value. >>> >> >> Not all files provided by the kernel are "per-kernel". /proc/self is >> full of per-namespace stuff. >> >>>> The way I see it, every file we need to setup from the outside is a >>>> hassle. Among many other things, it is just asking for duplication of >>>> efforts among multiple userspaces. >>>> >>>> netns does this for its proc files. The only reason we don't do it for >>>> cgroups-driven file, is that the semantics is very ill-defined. For this >>>> file, it doesn't seem to be the case. >>> >>> But it is the case. How do you intend to have the kernel decide what >>> value to put in there for a process in a container, or in a chroot? >>> >> >> one value per pidns. > > ok. (So should it be called /proc/pidns_uuid? Well, whatever. No > objection from me - thanks.) > > -serge > For completeness, I believe it should live in the same place it lives today, and become a symlink from /proc/self/boot_id. Consistent with what we have today for other values like this.