From: Helge Deller <deller@gmx.de>
To: Matt Mackall <mpm@selenic.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, Theodore Tso <tytso@mit.edu>
Subject: Re: [PATCH] Time-based RFC 4122 UUID generator
Date: Tue, 20 Nov 2007 22:59:58 +0100 [thread overview]
Message-ID: <200711202259.58745.deller@gmx.de> (raw)
In-Reply-To: <20071120063120.GD17536@waste.org>
On Tuesday 20 November 2007, Matt Mackall wrote:
> On Sun, Nov 18, 2007 at 10:40:34PM +0100, Helge Deller wrote:
> > On Sunday 18 November 2007, Andrew Morton wrote:
> > > On Sun, 18 Nov 2007 20:38:21 +0100 Helge Deller <deller@gmx.de> wrote:
> > >
> > > > Title: Add time-based RFC 4122 UUID generator
> > > >
> > > > The current Linux kernel currently contains the generate_random_uuid()
> > > > function, which creates - based on RFC 4122 - truly random UUIDs and
> > > > provides them to userspace through /proc/sys/kernel/random/boot_id and
> > > > /proc/sys/kernel/random/uuid.
> > > >
> > > > This patch additionally adds the "Time-based UUID" variant of RFC 4122,
> > > > with which userspace applications can easily get real unique time-based
> > > > UUIDs through /proc/sys/kernel/random/uuid_time.
> > > > A new /proc/sys/kernel/random/uuid_time_clockseq sysfs entry is available,
> > > > so that the clock_seq value can be retained across system bootups (which
> > > > is required by RFC 4122).
> > > >
> > > > The attached implementation uses getnstimeofday() to get very fine-grained
> > > > granularity. This helps, so that userspace tools can get a lot more UUIDs
> > > > (if needed) per time than before.
> > > > A mutex takes care of the proper locking against a mistaken double creation
> > > > of UUIDs for simultanious running processes.
> >
> >
> > > Who will use this feature, and for what?
> > > (In fact, who uses the existing UUID generators, and for what?)
> >
> > Current users I know of (but there are more):
> > - e2fsprogs uses it e.g. to create unique UUIDs for disks (it ships an own library for that)
> > - http://commons.apache.org/sandbox/id/uuid.html uses it with own libraries
> > - SAP Netweaver on Linux uses it (http://www.sap.com/platform/netweaver/index.epx)
> >
> > I'm mostly interested in fixing problems I see with SAP (I'm working for SAP).
> > SAP Netweaver often needs during a very short time frame lots of unique UUIDs
> > (to reference the data afterwards) when new data is imported into the database.
> > Main problem with current implementations is, is that they don't 100%
> > guarantee uniqness of the generated UUIDs. Sometimes, esp. on very fast
> > multi-processor machines, double UUIDs are generated and returned to the
> > application which is very bad and may result in unreliable behaviour.
> >
> > Current implemenations use userspace-libraries. In userspace you e.g. can't
> > easily protect the uniquness of a UUID against other running _processes_.
> > If you try do, you'll need to do locking e.g. with shared memory, which can
> > get very expensive.
>
> Even with a futex? Or userspace atomics?
Yes, you'll need a futex or similiar.
The problem is then more, where will you put that futex to be able to protect against other processes ?
Best solution is probably shared memory, but then the question will be, who is allowed to access this memory/futex ?
Will any process (shared library) be allowed to read/write/delete it ?
At this stage you then suddenly run from a locking-problem into a security problem, which is probably equally hard to solve.
Btw, this is how Novell tried to solve the time-based UUID generator problem in SLES and it's still not 100% fixed.
> I think something as simple
> as a server stuffing a bunch of clock sequence numbers into a pipe
> for clients to pop into their generated UUIDs should be plenty fast
> enough.
Sounds simple and is probably fast enough.
But do you really want to add then another daemon to the Linux system, just in case "some" application needs somewhen a UUID ?
And I think such an implementation is more complex, would need more memory, file handles, and so on than this simple kernel patch.
> > The problem will get even worse with virtualization technologies like XEN and
> > containers. There it's even impossible to protect against processes in other VMs.
>
> Nor does it make sense to try! A virtual machine is an independent machine
> after all.
Yes.
> > Another user which could benefit from it are embedded devices. They could
> > drop their userspace-implementations in favour of this smaller kernel version
> > to create UUIDs for their disks, using it in the webservers, ...
>
> That's a silly tradeoff. It's an unusual embedded device that ships
> with any need for a UUID, especially mkfs.
I think mkfs was a very bad example from my side. I should not have mentioned this one.
Nevertheless, time-based UUIDs are used in quite many other and more critical applications than e2fsprogs tools.
> And generally, putting a
> feature in the kernel has no inherent size advantage. In fact, it has
> a size disadvantage: it's no longer pageable.
True, but let's look at the facts.
Current libuuid.so (from e2fsprogs) library on Fedora 7 (i386):
text data bss dec hex filename
8101 368 40 8509 213d /lib/libuuid.so.1
And the kernel implementation:
text data bss dec hex filename
4877 604 2080 7561 1d89 drivers/char/random.o.without_uuid
5976 752 2080 8808 2268 drivers/char/random.o.withuuid
So my patch increases the kernel by 1099 bytes text and 148 bytes data while guaranteeing 100% unique UUIDs.
libuuid.so takes 8k test and 368 bytes data, and does not guarantees the uniqueness.
Of course libuuid.so has some helper functions as well in here, which - to be fair - shouldn't be counted.
Nevertheless, I think you can't get any smaller, faster and more secure implementation than only in the kernel.
Maybe it would make sense to add a CONFIG_TIME_UUID kernel option, so that distributors can decide themselves if they want the kernel UUID generator compiled in?
At least for Enterprise-ready Linux distributions it's a must.
> ps: I'm the listed random.c maintainer so you'll want to cc: me in the
> future.
Sure.
Helge
next prev parent reply other threads:[~2007-11-20 22:01 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-18 19:38 [PATCH] Time-based RFC 4122 UUID generator Helge Deller
2007-11-18 21:05 ` Andrew Morton
2007-11-18 21:34 ` Sam Ravnborg
2007-11-18 21:43 ` Helge Deller
2007-11-19 21:56 ` David Schwartz
2007-11-19 22:58 ` Alan Cox
2007-11-20 6:44 ` H. Peter Anvin
2007-11-20 22:58 ` Helge Deller
2007-11-21 0:20 ` Alan Cox
2007-11-18 21:40 ` Helge Deller
2007-11-20 6:31 ` Matt Mackall
2007-11-20 21:59 ` Helge Deller [this message]
2007-11-20 22:55 ` Matt Mackall
2007-11-20 23:11 ` Helge Deller
2007-11-20 23:34 ` Matt Mackall
2007-11-20 23:00 ` Theodore Tso
2007-11-20 23:30 ` Helge Deller
2007-12-10 5:36 ` [e2fsprogs PATCH] Userspace solution to time-based UUID without duplicates Theodore Tso
2007-12-16 21:53 ` Helge Deller
2007-12-17 0:07 ` Theodore Tso
2007-11-20 6:15 ` [PATCH] Time-based RFC 4122 UUID generator Andrew Morton
2007-11-20 22:40 ` Helge Deller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200711202259.58745.deller@gmx.de \
--to=deller@gmx.de \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mpm@selenic.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox