All of lore.kernel.org
 help / color / mirror / Atom feed
From: Helge Deller <deller@gmx.de>
To: Matt Mackall <mpm@selenic.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, Theodore Tso <tytso@mit.edu>
Subject: Re: [PATCH] Time-based RFC 4122 UUID generator
Date: Tue, 20 Nov 2007 22:59:58 +0100	[thread overview]
Message-ID: <200711202259.58745.deller@gmx.de> (raw)
In-Reply-To: <20071120063120.GD17536@waste.org>

On Tuesday 20 November 2007, Matt Mackall wrote:
> On Sun, Nov 18, 2007 at 10:40:34PM +0100, Helge Deller wrote:
> > On Sunday 18 November 2007, Andrew Morton wrote:
> > > On Sun, 18 Nov 2007 20:38:21 +0100 Helge Deller <deller@gmx.de> wrote:
> > > 
> > > > Title: Add time-based RFC 4122 UUID generator
> > > > 
> > > > The current Linux kernel currently contains the generate_random_uuid() 
> > > > function, which creates - based on RFC 4122 - truly random UUIDs and 
> > > > provides them to userspace through /proc/sys/kernel/random/boot_id and 
> > > > /proc/sys/kernel/random/uuid.
> > > > 
> > > > This patch additionally adds the "Time-based UUID" variant of RFC 4122, 
> > > > with which userspace applications can easily get real unique time-based 
> > > > UUIDs through /proc/sys/kernel/random/uuid_time.
> > > > A new /proc/sys/kernel/random/uuid_time_clockseq sysfs entry is available,
> > > > so that the clock_seq value can be retained across system bootups (which
> > > > is required by RFC 4122).
> > > > 
> > > > The attached implementation uses getnstimeofday() to get very fine-grained
> > > > granularity. This helps, so that userspace tools can get a lot more UUIDs 
> > > > (if needed) per time than before.
> > > > A mutex takes care of the proper locking against a mistaken double creation 
> > > > of UUIDs for simultanious running processes.
> > 
> > 
> > > Who will use this feature, and for what?
> > > (In fact, who uses the existing UUID generators, and for what?)
> > 
> > Current users I know of (but there are more):
> > - e2fsprogs uses it e.g. to create unique UUIDs for disks (it ships an own library for that)
> > - http://commons.apache.org/sandbox/id/uuid.html uses it with own libraries
> > - SAP Netweaver on Linux uses it (http://www.sap.com/platform/netweaver/index.epx)
> > 
> > I'm mostly interested in fixing problems I see with SAP (I'm working for SAP).
> > SAP Netweaver often needs during a very short time frame lots of unique UUIDs 
> > (to reference the data afterwards) when new data is imported into the database.
> > Main problem with current implementations is, is that they don't 100% 
> > guarantee uniqness of the generated UUIDs. Sometimes, esp. on very fast 
> > multi-processor machines, double UUIDs are generated and returned to the 
> > application which is very bad and may result in unreliable behaviour.
> > 
> > Current implemenations use userspace-libraries. In userspace you e.g. can't 
> > easily protect the uniquness of a UUID against other running _processes_.
> > If you try do, you'll need to do locking e.g. with shared memory, which can 
> > get very expensive.
> 
> Even with a futex? Or userspace atomics? 

Yes, you'll need a futex or similiar.
The problem is then more, where will you put that futex to be able to protect against other processes ? 
Best solution is probably shared memory, but then the question will be, who is allowed to access this memory/futex ?
Will any process (shared library) be allowed to read/write/delete it ?
At this stage you then suddenly run from a locking-problem into a security problem, which is probably equally hard to solve.
Btw, this is how Novell tried to solve the time-based UUID generator problem in SLES and it's still not 100% fixed.

> I think something as simple 
> as a server stuffing a bunch of clock sequence numbers into a pipe
> for clients to pop into their generated UUIDs should be plenty fast
> enough.

Sounds simple and is probably fast enough.
But do you really want to add then another daemon to the Linux system, just in case "some" application needs somewhen a UUID ?
And I think such an implementation is more complex, would need more memory, file handles, and so on than this simple kernel patch.

> > The problem will get even worse with virtualization technologies like XEN and
> > containers. There it's even impossible to protect against processes in other VMs.
> 
> Nor does it make sense to try! A virtual machine is an independent machine
> after all.

Yes.

> > Another user which could benefit from it are embedded devices. They could 
> > drop their userspace-implementations in favour of this smaller kernel version
> > to create UUIDs for their disks, using it in the webservers, ...
> 
> That's a silly tradeoff. It's an unusual embedded device that ships
> with any need for a UUID, especially mkfs. 

I think mkfs was a very bad example from my side. I should not have mentioned this one.
Nevertheless, time-based UUIDs are used in quite many other and more critical applications than e2fsprogs tools.

> And generally, putting a 
> feature in the kernel has no inherent size advantage. In fact, it has
> a size disadvantage: it's no longer pageable.

True, but let's look at the facts.

Current libuuid.so (from e2fsprogs) library on Fedora 7 (i386):
   text    data     bss     dec     hex filename
   8101     368      40    8509    213d /lib/libuuid.so.1

And the kernel implementation:
   text    data     bss     dec     hex filename
   4877     604    2080    7561    1d89 drivers/char/random.o.without_uuid
   5976     752    2080    8808    2268 drivers/char/random.o.withuuid

So my patch increases the kernel by 1099 bytes text and 148 bytes data while guaranteeing 100% unique UUIDs.
libuuid.so takes 8k test and 368 bytes data, and does not guarantees the uniqueness. 
Of course libuuid.so has some helper functions as well in here, which - to be fair - shouldn't be counted.
Nevertheless, I think you can't get any smaller, faster and more secure implementation than only in the kernel.

Maybe it would make sense to add a CONFIG_TIME_UUID kernel option, so that distributors can decide themselves if they want the kernel UUID generator compiled in?
At least for Enterprise-ready Linux distributions it's a must.

> ps: I'm the listed random.c maintainer so you'll want to cc: me in the
> future.

Sure.

Helge

  reply	other threads:[~2007-11-20 22:01 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-18 19:38 [PATCH] Time-based RFC 4122 UUID generator Helge Deller
2007-11-18 21:05 ` Andrew Morton
2007-11-18 21:34   ` Sam Ravnborg
2007-11-18 21:43     ` Helge Deller
2007-11-19 21:56       ` David Schwartz
2007-11-19 22:58         ` Alan Cox
2007-11-20  6:44         ` H. Peter Anvin
2007-11-20 22:58         ` Helge Deller
2007-11-21  0:20           ` Alan Cox
2007-11-18 21:40   ` Helge Deller
2007-11-20  6:31     ` Matt Mackall
2007-11-20 21:59       ` Helge Deller [this message]
2007-11-20 22:55         ` Matt Mackall
2007-11-20 23:11           ` Helge Deller
2007-11-20 23:34             ` Matt Mackall
2007-11-20 23:00         ` Theodore Tso
2007-11-20 23:30           ` Helge Deller
2007-12-10  5:36           ` [e2fsprogs PATCH] Userspace solution to time-based UUID without duplicates Theodore Tso
2007-12-16 21:53             ` Helge Deller
2007-12-17  0:07               ` Theodore Tso
2007-11-20  6:15 ` [PATCH] Time-based RFC 4122 UUID generator Andrew Morton
2007-11-20 22:40   ` Helge Deller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200711202259.58745.deller@gmx.de \
    --to=deller@gmx.de \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpm@selenic.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.