Linux Container Development
 help / color / mirror / Atom feed
From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
To: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Subject: Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
Date: Tue, 04 Sep 2012 12:46:05 -0700	[thread overview]
Message-ID: <87vcft1shu.fsf@xmission.com> (raw)
In-Reply-To: <20120904171818.GA5334-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> (Serge E. Hallyn's message of "Tue, 4 Sep 2012 17:18:18 +0000")

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
>> On 09/04/2012 07:25 PM, Serge Hallyn wrote:
>> > Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
>> >> On 09/04/2012 06:44 PM, Serge Hallyn wrote:
>> >>> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> >>>> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
>> >>>>
>> >>>>> On 08/31/2012 04:13 AM, Eric W. Biederman wrote:
>> >>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>> >>>>>>
>> >>>>>>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
>> >>>>>>>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>> >>>>>>>>
>> >>>>>>>>> One of the features that SystemD folks have asked us to fix in LXC, is
>> >>>>>>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
>> >>>>>>>>> container is started.
>> >>>>>>>>
>> >>>>>>>> There may be a good reason for this.  Most of the time what I have seen
>> >>>>>>>> of kernel requests from the direction of SystemD is that while there may
>> >>>>>>>> be a real problem but usually their imagined solution is not a
>> >>>>>>>> particularly good solution.  So a description of the problem is needed.
>> >>>>>>>>
>> >>>>>>>> Justifying something with just SystemD wants this is a good way to get
>> >>>>>>>> a nack.
>> >>>>>>>
>> >>>>>>> SystemD records log messages for all system services in their journal.
>> >>>>>>> They can show you all log messages for the current service execution,
>> >>>>>>> all log messages for a service since system boot, or all log messsages
>> >>>>>>> ever. The boot_id value is used as a unique tag to allow grouping of
>> >>>>>>> the log messages per system boot. When we run systemd inside a container
>> >>>>>>> we want to get that grouping of log messages generated by services inside
>> >>>>>>> the container, to take account of the container boot, not the host boot.
>> >>>>>>> Hence the desire to have the boot_id value reflect when a container is
>> >>>>>>> booted.
>> >>>>>>
>> >>>>>> Since SystemD post-dates containers and since the logging feature is not
>> >>>>>> currently in wide use that use case is completely non-persuasive.
>> >>>>>>
>> >>>>>> So far this just sounds like a plain SystemD bug and something that can
>> >>>>>> be easily changed at this point in time.
>> >>>>>>
>> >>>>>> It has been a long time but my fuzzy memory says that the originial
>> >>>>>> boot_id justification was based on use cases that could not be solved
>> >>>>>> any other way.
>> >>>>>>
>> >>>>>> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
>> >>>>>> that inspired the implementation of boot_id.  However reading the
>> >>>>>> current emacs source code it appears emacs gave up before boot_id
>> >>>>>> was implemented and stats /var/run/random-seed (which we seem to
>> >>>>>> have removed) or looks in wtmp or utmp for the latest boot record.
>> >>>>>>
>> >>>>>> I did a quick grep through the binaries on my system and I could not
>> >>>>>> find anything using /proc/sys/random/boot_id.
>> >>>>>>
>> >>>>>> That suggests to me that the proper solution is to actually just remove
>> >>>>>> boot_id.
>> >>>>>>
>> >>>>>> Hmm.  And then there is other interesting detail.  What should boot_id
>> >>>>>> return after the processes have migrated from one system to another.
>> >>>>>>
>> >>>>>
>> >>>>> Since this would be a per-boot id, this clearly has to be carried over
>> >>>>> with migration, along with all the tons of data we already carry.
>> >>>>
>> >>>> The twist of course is what does a boot mean.  If we are really after
>> >>>> machine boots than the current behavior is correct.
>> >>>>
>> >>>> Looking back in the archives the desired behavior appears to be a value
>> >>>> that can be used to see if a pid value must be stale.
>> >>>>
>> >>>> As a stale pid detector boot_id is pretty lousy.  Pids can still be
>> >>>> reused.
>> >>>>
>> >>>> Still a role as a stale pid detector makes it clear which namespace
>> >>>> boot_id should be in and how we should treat boot_id upon migration.
>> >>>>
>> >>>> You can only serve as a stale pid detector if you are in the pid
>> >>>> namespace.
>> >>>>
>> >>>> So at this point patches are welcome.  Hopefully with a summary
>> >>>> of the discussion.
>> >>>
>> >>> I don't understand why this should be provided by the kernel.  Especially
>> >>> given that we've proven that everyone really wants this to be per-container
>> >>> as well.
>> >>>
>> >>> So why not just have init, on startup, create a /run/boot_id file, perhaps
>> >>> by sha1summing the time at which it started perhaps plus some nonce?
>> >>>
>> >> Why shouldn't it provided by the kernel?, is the real question
>> > 
>> > Because it's not the right place.  The origin of this thread proves that
>> > people want a per-init, not per-kernel, value.
>> > 
>> 
>> Not all files provided by the kernel are "per-kernel". /proc/self is
>> full of per-namespace stuff.
>> 
>> >> The way I see it, every file we need to setup from the outside is a
>> >> hassle. Among many other things, it is just asking for duplication of
>> >> efforts among multiple userspaces.
>> >>
>> >> netns does this for its proc files. The only reason we don't do it for
>> >> cgroups-driven file, is that the semantics is very ill-defined. For this
>> >> file, it doesn't seem to be the case.
>> > 
>> > But it is the case.  How do you intend to have the kernel decide what
>> > value to put in there for a process in a container, or in a chroot?
>> > 
>> 
>> one value per pidns.
>
> ok.  (So should it be called /proc/pidns_uuid?  Well, whatever.  No
> objection from me - thanks.)

/proc/sys/kernel/boot_id.

Someday we will get the plumbing right in the kernel so that can be
/proc/sys -> /proc/self/sys and /proc/self/sys/kernel/boot_id

The origin of boot_id was so that emacs could implement distributed
locking in userspace by creating a symlink from .#filename to 
user-WI0L6dQK/Vr7saj2s7cPmQ@public.gmane.org:boot_id.

Ultimately emacs opted to just stat /var/run/random-seed or to grovel
through utmp or wtmp to find the last boot record.

Of course /var/run/random-seed is now named something like
/var/lib/urandom/random-seed as distributions continue their relentless
pursuit to break userspace.

But ultimately boot_id was defined as something you can use to detect
stale pids and stale lockfiles.  Since the original definition was
a uuid to detect stale pids, that seems a reasonable justification
for keeping it in the pid_namespace.  Boot_id isn't the best name in
that case but shrug.

Eric

  parent reply	other threads:[~2012-09-04 19:46 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-30 21:18 Virtualizing /proc/sys/kernel/random/boot_id per container ? Daniel P. Berrange
     [not found] ` <20120830211832.GA3297-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-08-30 22:15   ` Eric W. Biederman
     [not found]     ` <878vcwjabu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-08-30 22:50       ` Daniel P. Berrange
     [not found]         ` <20120830225002.GA9226-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-08-31  0:13           ` Eric W. Biederman
     [not found]             ` <87bohrhqai.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-03  7:56               ` Glauber Costa
     [not found]                 ` <5044629C.3030909-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-03 19:48                   ` Eric W. Biederman
     [not found]                     ` <87r4qi6g6k.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-04  8:42                       ` Glauber Costa
     [not found]                         ` <5045BF05.9050707-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04  9:16                           ` Glauber Costa
     [not found]                             ` <5045C707.9020001-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04  9:53                               ` Eric W. Biederman
2012-09-04  9:20                           ` Eric W. Biederman
     [not found]                             ` <878vcq5ekx.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-04 12:08                               ` Daniel P. Berrange
2012-09-04 15:28                               ` Serge Hallyn
2012-09-04 14:44                       ` Serge Hallyn
2012-09-04 14:45                         ` Glauber Costa
     [not found]                           ` <50461421.7030305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04 15:25                             ` Serge Hallyn
2012-09-04 15:31                               ` Glauber Costa
     [not found]                                 ` <50461EBB.2050501-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04 17:18                                   ` Serge E. Hallyn
     [not found]                                     ` <20120904171818.GA5334-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-09-04 19:46                                       ` Eric W. Biederman [this message]
     [not found]                                         ` <87vcft1shu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-05 12:10                                           ` Daniel P. Berrange
2012-09-05  7:59                                       ` Glauber Costa
2012-08-30 23:22       ` Daniel P. Berrange
     [not found]         ` <20120830232239.GE9226-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-08-31  0:18           ` Eric W. Biederman
2012-08-31 13:25       ` Serge Hallyn
2012-09-03  7:53         ` Glauber Costa
     [not found]           ` <504461F1.1090400-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04 14:42             ` Serge Hallyn
2012-09-03  7:52       ` Glauber Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vcft1shu.fsf@xmission.com \
    --to=ebiederm-as9lmozglivwk0htik3j/w@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox