From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
To: "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Subject: Re: Virtualizing /proc/sys/kernel/random/boot_id per container ?
Date: Thu, 30 Aug 2012 17:13:25 -0700 [thread overview]
Message-ID: <87bohrhqai.fsf@xmission.com> (raw)
In-Reply-To: <20120830225002.GA9226-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> (Daniel P. Berrange's message of "Thu, 30 Aug 2012 15:50:02 -0700")
"Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
>> "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>
>> > One of the features that SystemD folks have asked us to fix in LXC, is
>> > to make sure that /proc/sys/kernel/random/boot_id changes each time a
>> > container is started.
>>
>> There may be a good reason for this. Most of the time what I have seen
>> of kernel requests from the direction of SystemD is that while there may
>> be a real problem but usually their imagined solution is not a
>> particularly good solution. So a description of the problem is needed.
>>
>> Justifying something with just SystemD wants this is a good way to get
>> a nack.
>
> SystemD records log messages for all system services in their journal.
> They can show you all log messages for the current service execution,
> all log messages for a service since system boot, or all log messsages
> ever. The boot_id value is used as a unique tag to allow grouping of
> the log messages per system boot. When we run systemd inside a container
> we want to get that grouping of log messages generated by services inside
> the container, to take account of the container boot, not the host boot.
> Hence the desire to have the boot_id value reflect when a container is
> booted.
Since SystemD post-dates containers and since the logging feature is not
currently in wide use that use case is completely non-persuasive.
So far this just sounds like a plain SystemD bug and something that can
be easily changed at this point in time.
It has been a long time but my fuzzy memory says that the originial
boot_id justification was based on use cases that could not be solved
any other way.
My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
that inspired the implementation of boot_id. However reading the
current emacs source code it appears emacs gave up before boot_id
was implemented and stats /var/run/random-seed (which we seem to
have removed) or looks in wtmp or utmp for the latest boot record.
I did a quick grep through the binaries on my system and I could not
find anything using /proc/sys/random/boot_id.
That suggests to me that the proper solution is to actually just remove
boot_id.
Hmm. And then there is other interesting detail. What should boot_id
return after the processes have migrated from one system to another.
>> > The current semantics are that this file produces a new random UUID each
>> > time the host OS is booted. Obviously each time we start a container now,
>> > they just see the host's random boot_id, so from a container's POV this
>> > does not change each time it starts.
>>
>> That is correct. As I recall the contract with boot_id is to provide
>> a unique per boot value to assist in dealing with boots etc. I seem
>> to recall emacs uses the combination of hostname+boot_id to help
>> generate unique lock files names.
>>
>> I would definitely need a refresher on how boot_id is used in practice
>> by applications other than SystemD before I could suggest a good design.
>>
>> There is also a question of uptime.
>
> Agreed, as you say, this is one of many other /proc values needing
> virtualizing for container.
If you think of it as virtualization and you figure the requirement is
to exactly replicate a non-containerized system you won't come up with
suggestions that make sense to implement.
For the most part the semantics of namespaces exist to support process
migration.
>> > There seems to be general agreement that, aside from the PID directories,
>> > changes to data in proc should be done by a FUSE filesystem overlay of
>> > some kind.
>>
>> No. I have yet to see a justification for using FUSE in containers on
>> top of proc files.
>>
>> I have seen a lot of bad ideas suggested like hacking /proc/cpuinfo
>> instead of providing a proper mechanism to tell applications how
>> parallel they can/should be.
>>
>> For hacks and controversial ideas FUSE is good because it makes it
>> someone else's problem and it means it isn't something we have to
>> support in the kernel for the indefinite future. At the same time in
>> general a FUSE solution does not really solve anything it just sort of
>> papers over a problem.
>>
>> For some problems papering over them is good enough, for other problems
>> they really should be solved properly.
>
> Ok, well I guess things aren't as clear cut as I understood then. I've
> been told that FUSE was the desired approach to dealing with all the
> various files in /proc that might need changing for containers. Personally
> I don't much care what approach is used - if the kernel wants to do more
> stuff that's fine with my from a libvirt LXC POV. I'll just follow whatever
> the consensus is in this area.
Largely what I have seen is a bunch of half thought out hacks and the
consensus being (ick don't bother me...). In which case FUSE is a good
answer as it doesn't obligate anyone to maintain or care about the code,
except those who want the hack.
>> > We could use that mechanism to fix 'boot_id' in userspace, but
>> > I'm wondering if this is a better candidate for dealing with in kernel
>> > space, since as well as the /proc/sys tree, the data is also visible via
>> > the sysctl() system call which a FUSE overlay won't address.
>>
>> Any application that uses the sysctl() system call needs to be fixed.
>> When I looked years ago the number of applications using sysctl() could
>> be numbered on one hand and most of those applications were the fedora
>> installer, and the fedora installer hasn't used sysctl.
>
> Ok, I did wonder whether anyone would actually use sysctl() instead
> of reading /proc/sys. If we can ignore the sysctl that gives us more
> options.
Most definitely. The warning sysctl spews when you figure out how
to call it should be a good clue.
>> > The kernel doesn't have a real concept of a 'container' to associate
>> > a boot_id value with as such, but maybe it is reasonable to associate
>> > a boot_id value with each PID namespace ?
>>
>> There is also the question of uptime and clocks and things like that.
>>
>> The utsnamespace might be a more resasonable place to tack on that kind
>> of extended functionality.
>>
>> Just changing boot_id itself and not all of the other bits that track
>> when we have booted does not seem reasonable.
>>
>> Once we can sort out the details a kernel implementation should be quite
>> trivial. It just requires the appropriate sysctl registration dance.
>
> Ok, I'll try to identify a list of other related parts which need changing
> wrt boot.
>
> Thanks for the feedback.
I hope it helps.
There may be a justification and a good case for messing with boot_id
but I don't currently see it.
What I see (so far) is SystemD unnecessarily tying itself to linux
implemenation details.
Eric
next prev parent reply other threads:[~2012-08-31 0:13 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-30 21:18 Virtualizing /proc/sys/kernel/random/boot_id per container ? Daniel P. Berrange
[not found] ` <20120830211832.GA3297-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-08-30 22:15 ` Eric W. Biederman
[not found] ` <878vcwjabu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-08-30 22:50 ` Daniel P. Berrange
[not found] ` <20120830225002.GA9226-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-08-31 0:13 ` Eric W. Biederman [this message]
[not found] ` <87bohrhqai.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-03 7:56 ` Glauber Costa
[not found] ` <5044629C.3030909-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-03 19:48 ` Eric W. Biederman
[not found] ` <87r4qi6g6k.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-04 8:42 ` Glauber Costa
[not found] ` <5045BF05.9050707-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04 9:16 ` Glauber Costa
[not found] ` <5045C707.9020001-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04 9:53 ` Eric W. Biederman
2012-09-04 9:20 ` Eric W. Biederman
[not found] ` <878vcq5ekx.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-04 12:08 ` Daniel P. Berrange
2012-09-04 15:28 ` Serge Hallyn
2012-09-04 14:44 ` Serge Hallyn
2012-09-04 14:45 ` Glauber Costa
[not found] ` <50461421.7030305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04 15:25 ` Serge Hallyn
2012-09-04 15:31 ` Glauber Costa
[not found] ` <50461EBB.2050501-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04 17:18 ` Serge E. Hallyn
[not found] ` <20120904171818.GA5334-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-09-04 19:46 ` Eric W. Biederman
[not found] ` <87vcft1shu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-05 12:10 ` Daniel P. Berrange
2012-09-05 7:59 ` Glauber Costa
2012-08-30 23:22 ` Daniel P. Berrange
[not found] ` <20120830232239.GE9226-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-08-31 0:18 ` Eric W. Biederman
2012-08-31 13:25 ` Serge Hallyn
2012-09-03 7:53 ` Glauber Costa
[not found] ` <504461F1.1090400-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-04 14:42 ` Serge Hallyn
2012-09-03 7:52 ` Glauber Costa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bohrhqai.fsf@xmission.com \
--to=ebiederm-as9lmozglivwk0htik3j/w@public.gmane.org \
--cc=berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox