Re: [PATCH 2/2] UUID: Time-based RFC 4122 UUID generator

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Helge Deller <deller@gmx.de>
To: Theodore Tso <tytso@thunk.org>
Cc: Andrew Morton <akpm@osdl.org>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] UUID: Time-based RFC 4122 UUID generator
Date: Sun, 21 Oct 2007 22:11:41 +0200	[thread overview]
Message-ID: <200710212211.41403.deller@gmx.de> (raw)
In-Reply-To: <20071021122639.GB7045@thunk.org>

On Sunday 21 October 2007, Theodore Tso wrote:
> On Sat, Oct 20, 2007 at 10:00:03PM +0200, Helge Deller wrote:
> > Additionally a mutex takes care of the proper locking against a mistaken
> > double creation of UUIDs for simultanious running processes.
> 
> This is trickier to do in userspace, given that in userspace we have
> to worry about malicious code that might grab and hold the mutex for
> long periods of time.  It is soluble, though.

Yes, userspace would need quite some overhead/syncronization/locking tricks to get it right.
That's basically one of the reasons I prefer the in-kernel solution.
The other reason is, that due to reduced code size compared to user-space, it's even a nice and clean solution for embedded devices.

> > +	/* Determine clock sequence (max. 14 bit) */
> > +	if (unlikely(!clock_seq_initialized)) {
> > +		get_random_bytes(&clock_seq, sizeof(clock_seq));
> > +		clock_seq &= 0x3fff;
> > +		clock_seq_started = clock_seq;
> > +		clock_seq_initialized = 1;
> 
> If you're really going to do this right, it should be possible to get
> and set the clock sequence from userspace, since the clock sequence is
> supposed to be retained across system bootups.  Otherwise, it's
> possible for you to get really unlucky, and pick the same clock
> sequence number just at the same point that the clock gets changed.
> Not all that likely, granted, but technically it's good thing to do
> and required by RFC 4122.

Ok, I'll add that to the next version of the patch.
Do you have a suggestion how to realize that ?
I currently see two possibilities:
a) reading: userspace reads /proc/sys/kernel/random/uuid_time and parses/stores the clock sequence number.
    writing: just echo a value to /proc/sys/kernel/random/uuid_time, kernel will use this value then as new clock sequence number.
b) add a new sysfs entry, e.g. /proc/sys/kernel/random/uuid_time_clockseq, to read/write the current/new value

Possibility b) is probably the cleaner solution, although it adds some more code.
What's your opinion ?

> I don't do this in the userspace library, but again that's because of
> the security issue of dealiing with multiple userids needing to update
> a file.  (What we really need to do this right in userspace is the
> concept of lightweight privileged shared libraries, such as what
> Multics had.)  So if that's going to be the justification to do this
> in the kernel, it should be possible to set/get the clock sequence
> number, so that in an init.d script, the clock sequence numer can be
> fetched at bootup and stored at shutdown.  (Yeah, that still leaves
> open the possibility of how to handle an unclean shutdown.  If you
> really wanted to be anal about guaranteeing that a clock sequence
> number was never reused, any time a clock sequence number is changed,
> a hal event could be sent that would cause a userspace helper script
> to sample the clock sequence and update the file.)

Yuk. If you really want, I'll add that hal event...

> > +	/* Set the spatially unique node identifier */
> > +#ifdef CONFIG_NET
> > +	read_lock(&dev_base_lock);
> > +	for_each_netdev(&init_net, d) {
> > +		if (d->type == ARPHRD_ETHER && d->addr_len == ETH_ALEN 
> > +		    && d != init_net.loopback_dev) {
> > +			memcpy(&uuid_out[10], d->dev_addr, ETH_ALEN);
> > +			netdev_found = 1;
> > +			break;
> > +		}
> > +	}
> > +	read_unlock(&dev_base_lock);
> > +#endif
> 
> This means you have to grab the dev_base_lock each time you generate a
> UUID.  

Yes, IMHO this seems unavoidable. 
Basically even the user-space implementation gets the lock. It's just indirect and less efficient.

> In addition, since this code always grabs the first ethernet 
> device in the list, if the list has changed --- for example, if
> someone inserts a PCMCIA wireless or ethernet card, or removes the
> card for power management reasons --- you could end up changing the
> MAC address and forcing a clock sequence bump even though it's not
> necessary.  So there are a couple of things that we might do here.

Agreed.

> First of all, if we *know* that that a particular mac address is
> associated with a card which is hardwired to the machine, we're
> actually better off using it even if it is no longer present in the
> list.  But aside from getting a userspace hint, I don't see a good way
> for us to implement this.

Me neither.

> What you probably should do, since you're keeping the MAC address
> around to compare if it changes, is to search the list to see if the
> MAC address is still on the list, and use it if it's there; and if it
> isn't, then you can use the first ethernet address found instead.

Yes, I'll do that in the next version of the patch.

> > +	if (unlikely(!netdev_found)) {
> > +		/* use bootid's nodeID if no network interface found */
> 
> If CONFIG_NET is undefined, the netdev_found will always be false, so
> the unlikely() would be particularly inappropriate.  Probably will
> only matter for embedded systems, but they won't thank you....

When I wrote that, I thought the same and my first idea was just to move the if clause before the #endif, e.g. like this:
	if (unlikely(!netdev_found))
	#endif /* CONFIG_NET */
	{ ....
But I didn't like that code folding very much.

On the other side, if CONFIG_NET is undefined, netdev_found still stays at "0", and the compiler should
convert that directly to 
	if (unlikely(!0)) {
	/* use bootid's nodeID if no network interface found */
and just drop the if() and thus the unlikely() statement.
So I think the compiler will just optimize it away and ignore the unlikely (because as it knows, it's not unlikely any more).

> > +	/* if MAC/NodeID changed, create a new clock_seq value */
> > +	if (unlikely(memcmp(&uuid_out[10], &last_mac, ETH_ALEN))) {
> > +		memcpy(&last_mac, &uuid_out[10], ETH_ALEN);
> > +		/* for very first UUID, do not increment clock_seq */
> > +		if (unlikely(clock_seq_initialized == 1))
> > +			clock_seq_initialized = 2;
> > +		else
> > +			goto inc_clock_seq;
> > +	}
> 
> The goto loop could be much simplified if you grab the MAC/NodeID
> *before* you do clock_seq logic.  Right now, if the Mac/NodeID
> changes, you end up having to redo a lot of processing for no good
> reason (and under the uuid_mutex).

I'm not sure if this gains much. Probably the jumps will just be at some other place or the code will become more complex.
MAC/NodeID changes quite seldom. And if it changes, the machine will be quite busy anyway, e.g. configure IP addresses/DHCP, ...
But anyway, I'll give it a try for the next patch.

Thanks,
Helge

next prev parent reply	other threads:[~2007-10-21 20:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-20 19:39 [PATCH 0/2] Time-based RFC 4122 UUID generator Helge Deller
2007-10-20 19:58 ` [PATCH 1/2] UUID: set multicast bit in pseudo-random MAC address Helge Deller
2007-10-21 11:32   ` Theodore Tso
2007-10-21 19:09     ` Helge Deller
2007-10-20 20:00 ` [PATCH 2/2] UUID: Time-based RFC 4122 UUID generator Helge Deller
2007-10-21 12:26   ` Theodore Tso
2007-10-21 20:11     ` Helge Deller [this message]
2007-11-04 21:42       ` Helge Deller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200710212211.41403.deller@gmx.de \
    --to=deller@gmx.de \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@thunk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.