* [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)
@ 2006-11-17 13:43 Jeff Layton
2006-11-17 13:50 ` Matthew Wilcox
2006-11-17 14:24 ` Matthew Wilcox
0 siblings, 2 replies; 11+ messages in thread
From: Jeff Layton @ 2006-11-17 13:43 UTC (permalink / raw)
To: linux-fsdevel
Here's a completely different approach to ensuring inode uniqueness.
This one was inspired by a suggestion by Al Viro. I'll refer to my
earlier email for a description of the problem...
We already have what could be considered a unique number for each inode
-- the inode pointer address. The problem is converting that into an
i_ino value.
With this patch, when new_inode is called, we pretend that all of the
kernel memory is one huge array of inode pointers, and determine what
the position of the pointer would be in the array. We then take that
value, and mask off anything higher than 32 bits. Obviously this is a
much cheaper operation than keeping track of what's been allocated.
Since we're masking off the high bits, we have a chance for collisions
when those bits become significant. On my x86_64 FC6 machine, an inode
struct is 720 bytes according to slabinfo. The next lowest power of two
is 512 (2^9), so we automatically get 9 bits for "free". So this scheme
can cope with any situation where two inode addresses are not more than
2^41 (2 petabytes) apart.
This calculation was done quickly, so I might be off by one
exponentially, but still I think we'd probably be OK for the next
several years with this scheme. inode structs are smaller on 32 bit
boxes, but they won't have 64-bit pointers so this won't be an issue
there.
There are a couple of problems, but I think this patch should address
them too:
1) because the slab allocator tends to reuse slab objects quickly,
i_ino's get reused quickly. The patch copes with this by removing the
initialization of i_generation from alloc_inode, and having new_inode
increment that value. This should make sure that when an inode slab
object is reused that it at least has a different i_generation than
before (barring major page allocation/release churn in the slab). There
may be callers of new_inode that assume that the i_generation they get
is 0. They'll need to be fixed with this scheme, but that should be
fairly easy.
2) this scheme would effectively leak inode addresses into userspace.
I'm not sure if that would be exploitable, but it's probably best not to
do it. The patch adds a static unsigned int that is initialized to a
random value at boot time. We'll xor the inode offset with this value.
That should allow for a unique i_ino value, but since the xor mask would
be secret, it shouldn't be possible to turn it back into an address.
There may be a more secure way to do this. I'm definitely open to
suggestions here.
Again, patch is still a little rough but this one shouldn't need much
work if it looks good. Comments, thoughts, suggestions appreciated.
Thanks,
Jeff
--- linux-2.6.18.noarch/fs/inode.c.ino2uint
+++ linux-2.6.18.noarch/fs/inode.c
@@ -22,6 +22,7 @@
#include <linux/bootmem.h>
#include <linux/inotify.h>
#include <linux/mount.h>
+#include <linux/random.h>
/*
* This is needed for the following functions:
@@ -98,6 +99,15 @@ static DEFINE_MUTEX(iprune_mutex);
struct inodes_stat_t inodes_stat;
static kmem_cache_t * inode_cachep __read_mostly;
+static unsigned int inode_xor_mask;
+
+/* convert an inode address into an unsigned int and xor it with a random value
+ * determined at boot time */
+static inline unsigned int inode_to_uint (struct inode *inode)
+{
+ return ((((unsigned long) (inode - (struct inode *) 0))
+ ^ inode_xor_mask) & 0xffffffff);
+}
static struct inode *alloc_inode(struct super_block *sb)
{
@@ -125,7 +135,6 @@ static struct inode *alloc_inode(struct
inode->i_size = 0;
inode->i_blocks = 0;
inode->i_bytes = 0;
- inode->i_generation = 0;
#ifdef CONFIG_QUOTA
memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
#endif
@@ -546,7 +555,6 @@ repeat:
*/
struct inode *new_inode(struct super_block *sb)
{
- static unsigned long last_ino;
struct inode * inode;
spin_lock_prefetch(&inode_lock);
@@ -557,7 +565,8 @@ struct inode *new_inode(struct super_blo
inodes_stat.nr_inodes++;
list_add(&inode->i_list, &inode_in_use);
list_add(&inode->i_sb_list, &sb->s_inodes);
- inode->i_ino = ++last_ino;
+ inode->i_ino = inode_to_uint(inode);
+ inode->i_generation++;
inode->i_state = 0;
spin_unlock(&inode_lock);
}
@@ -1393,6 +1402,9 @@ void __init inode_init(unsigned long mem
for (loop = 0; loop < (1 << i_hash_shift); loop++)
INIT_HLIST_HEAD(&inode_hashtable[loop]);
+
+ /* initialize the xor mask for unique inode generation */
+ get_random_bytes(&inode_xor_mask, sizeof(inode_xor_mask));
}
void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)
2006-11-17 13:43 [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion) Jeff Layton
@ 2006-11-17 13:50 ` Matthew Wilcox
2006-11-17 14:14 ` Jörn Engel
2006-11-17 14:21 ` Jeff Layton
2006-11-17 14:24 ` Matthew Wilcox
1 sibling, 2 replies; 11+ messages in thread
From: Matthew Wilcox @ 2006-11-17 13:50 UTC (permalink / raw)
To: Jeff Layton; +Cc: linux-fsdevel
On Fri, Nov 17, 2006 at 08:43:00AM -0500, Jeff Layton wrote:
> +/* convert an inode address into an unsigned int and xor it with a random value
> + * determined at boot time */
> +static inline unsigned int inode_to_uint (struct inode *inode)
> +{
> + return ((((unsigned long) (inode - (struct inode *) 0))
> + ^ inode_xor_mask) & 0xffffffff);
> +}
Seems a little obfuscated. Why not simply:
return ((unsigned long)inode ^ inode_xor_mask) & 0xffffffff;
> @@ -125,7 +135,6 @@ static struct inode *alloc_inode(struct
> inode->i_size = 0;
> inode->i_blocks = 0;
> inode->i_bytes = 0;
> - inode->i_generation = 0;
> #ifdef CONFIG_QUOTA
> memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
> #endif
It seems to me that filesystems with fake inodes could instead
initialise i_generation to, say, jiffies. What do you think to that?
I like this idea, very creative.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)
2006-11-17 13:50 ` Matthew Wilcox
@ 2006-11-17 14:14 ` Jörn Engel
2006-11-17 14:24 ` Jeff Layton
2006-11-17 14:21 ` Jeff Layton
1 sibling, 1 reply; 11+ messages in thread
From: Jörn Engel @ 2006-11-17 14:14 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Jeff Layton, linux-fsdevel
On Fri, 17 November 2006 06:50:37 -0700, Matthew Wilcox wrote:
> On Fri, Nov 17, 2006 at 08:43:00AM -0500, Jeff Layton wrote:
> > +/* convert an inode address into an unsigned int and xor it with a random value
> > + * determined at boot time */
> > +static inline unsigned int inode_to_uint (struct inode *inode)
> > +{
> > + return ((((unsigned long) (inode - (struct inode *) 0))
> > + ^ inode_xor_mask) & 0xffffffff);
> > +}
>
> Seems a little obfuscated. Why not simply:
>
> return ((unsigned long)inode ^ inode_xor_mask) & 0xffffffff;
Because that could give i_ino collisions with >4GiB memory. The
original would require something like 2PiB memory before the first
collision.
What I wonder, though, is how much code gcc generates for the pointer
arithmetic. Maybe something like
return (((unsigned long)inode >> INODE_SHIFT) ^ inode_xor_mask)
& 0xffffffff);
would be faster.
> > @@ -125,7 +135,6 @@ static struct inode *alloc_inode(struct
> > inode->i_size = 0;
> > inode->i_blocks = 0;
> > inode->i_bytes = 0;
> > - inode->i_generation = 0;
> > #ifdef CONFIG_QUOTA
> > memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
> > #endif
>
> It seems to me that filesystems with fake inodes could instead
> initialise i_generation to, say, jiffies. What do you think to that?
If you are talking about inode_init_once() here, I like the idea.
i_generation must be initialized somehow, even a 4-byte information
leak could be a problem. But the initialization should happen in the
rare event of allocating a slab page, not when reusing a deleted
inode.
Jeff, why do you increment i_generation in new_inode instead of here?
Jörn
--
...one more straw can't possibly matter...
-- Kirby Bakken
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)
2006-11-17 13:50 ` Matthew Wilcox
2006-11-17 14:14 ` Jörn Engel
@ 2006-11-17 14:21 ` Jeff Layton
2006-11-17 16:31 ` Jeff Layton
1 sibling, 1 reply; 11+ messages in thread
From: Jeff Layton @ 2006-11-17 14:21 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-fsdevel
On Fri, 2006-11-17 at 06:50 -0700, Matthew Wilcox wrote:
> On Fri, Nov 17, 2006 at 08:43:00AM -0500, Jeff Layton wrote:
> > +/* convert an inode address into an unsigned int and xor it with a random value
> > + * determined at boot time */
> > +static inline unsigned int inode_to_uint (struct inode *inode)
> > +{
> > + return ((((unsigned long) (inode - (struct inode *) 0))
> > + ^ inode_xor_mask) & 0xffffffff);
> > +}
>
> Seems a little obfuscated. Why not simply:
>
> return ((unsigned long)inode ^ inode_xor_mask) & 0xffffffff;
Because the lower 8-9 bits of the inode pointer aren't significant
(presuming an inode struct size of ~400-800 bytes). If we take those out
of the picture then we extend the range of addresses that we can
uniquely squish into a 32 bit value.
Of course, all of this depends on the idea that the slab allocator grabs
pages that are somewhat close together in the kernel's address space.
I'm trying to figure out whether that is the case or not...
I've also been told that the address subtraction above is bogus and
undefined, though it does seem to work. I'll probably need to respin
that part into actual bit shifts by hand...
-- Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)
2006-11-17 14:14 ` Jörn Engel
@ 2006-11-17 14:24 ` Jeff Layton
0 siblings, 0 replies; 11+ messages in thread
From: Jeff Layton @ 2006-11-17 14:24 UTC (permalink / raw)
To: Jörn Engel; +Cc: Matthew Wilcox, linux-fsdevel
On Fri, 2006-11-17 at 15:14 +0100, Jörn Engel wrote:
> If you are talking about inode_init_once() here, I like the idea.
> i_generation must be initialized somehow, even a 4-byte information
> leak could be a problem. But the initialization should happen in the
> rare event of allocating a slab page, not when reusing a deleted
> inode.
>
> Jeff, why do you increment i_generation in new_inode instead of here?
I suppose we could do that...that might be better for stuff that calls
alloc_inode directly.
-- Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)
2006-11-17 13:43 [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion) Jeff Layton
2006-11-17 13:50 ` Matthew Wilcox
@ 2006-11-17 14:24 ` Matthew Wilcox
2006-11-17 14:48 ` Jeff Layton
1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2006-11-17 14:24 UTC (permalink / raw)
To: Jeff Layton; +Cc: linux-fsdevel
On Fri, Nov 17, 2006 at 08:43:00AM -0500, Jeff Layton wrote:
> 2) this scheme would effectively leak inode addresses into userspace.
> I'm not sure if that would be exploitable, but it's probably best not to
> do it. The patch adds a static unsigned int that is initialized to a
> random value at boot time. We'll xor the inode offset with this value.
> That should allow for a unique i_ino value, but since the xor mask would
> be secret, it shouldn't be possible to turn it back into an address.
> There may be a more secure way to do this. I'm definitely open to
> suggestions here.
I *think* the xor mask is mere obfuscation. It looks likely that you can
recover it with a little bit of trial and error. If you can force the
filesystem to hand you back new inodes quickly such that there is a high
probability you get consecutive allocations, you'll get a sequence which
would be spaced 700-odd bytes apart, except that it's been xored. Since
you know it's incrementing, if you see the sequence decrease, you'll
know that was a 1 in that bit.
It'd be a bit more complex than that, and cryptanalysis was never my
forte, but I suspect we should either use a folded hash like md5, or
give up and just divide the address by sizeof(struct inode). Sure,
divides are slow, but this is a divide by a constant, so it shouldn't be
that bad.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)
2006-11-17 14:24 ` Matthew Wilcox
@ 2006-11-17 14:48 ` Jeff Layton
2006-11-17 15:01 ` Dave Kleikamp
0 siblings, 1 reply; 11+ messages in thread
From: Jeff Layton @ 2006-11-17 14:48 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-fsdevel
On Fri, 2006-11-17 at 07:24 -0700, Matthew Wilcox wrote:
> I *think* the xor mask is mere obfuscation. It looks likely that you can
> recover it with a little bit of trial and error. If you can force the
> filesystem to hand you back new inodes quickly such that there is a high
> probability you get consecutive allocations, you'll get a sequence which
> would be spaced 700-odd bytes apart, except that it's been xored. Since
> you know it's incrementing, if you see the sequence decrease, you'll
> know that was a 1 in that bit.
I think you're right, the addresses would often be sequential, so this
is probably crackable. I'll look over the md5 routines when I get the
chance, though if someone more cryptographically inclined than I has a
different suggestion, I'd love to hear it.
-- Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)
2006-11-17 14:48 ` Jeff Layton
@ 2006-11-17 15:01 ` Dave Kleikamp
2006-11-17 15:06 ` Jeff Layton
0 siblings, 1 reply; 11+ messages in thread
From: Dave Kleikamp @ 2006-11-17 15:01 UTC (permalink / raw)
To: Jeff Layton; +Cc: Matthew Wilcox, linux-fsdevel
On Fri, 2006-11-17 at 09:48 -0500, Jeff Layton wrote:
> On Fri, 2006-11-17 at 07:24 -0700, Matthew Wilcox wrote:
> > I *think* the xor mask is mere obfuscation. It looks likely that you can
> > recover it with a little bit of trial and error. If you can force the
> > filesystem to hand you back new inodes quickly such that there is a high
> > probability you get consecutive allocations, you'll get a sequence which
> > would be spaced 700-odd bytes apart, except that it's been xored. Since
> > you know it's incrementing, if you see the sequence decrease, you'll
> > know that was a 1 in that bit.
>
> I think you're right, the addresses would often be sequential, so this
> is probably crackable.
Wouldn't you only be able to only crack a few of the low-order bits due
to a cluster of inodes being sequential? I don't think you'd be able
crack enough of it to be useful. You may be able to determine where
some inodes are relative to others, but I don't think you'd be able to
point the their location in memory. I don't know anything about crypto,
so I could be wrong.
> I'll look over the md5 routines when I get the
> chance, though if someone more cryptographically inclined than I has a
> different suggestion, I'd love to hear it.
> -- Jeff
Shaggy
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)
2006-11-17 15:01 ` Dave Kleikamp
@ 2006-11-17 15:06 ` Jeff Layton
2006-11-17 15:26 ` Dave Kleikamp
0 siblings, 1 reply; 11+ messages in thread
From: Jeff Layton @ 2006-11-17 15:06 UTC (permalink / raw)
To: Dave Kleikamp; +Cc: Matthew Wilcox, linux-fsdevel
On Fri, 2006-11-17 at 09:01 -0600, Dave Kleikamp wrote:
> Wouldn't you only be able to only crack a few of the low-order bits due
> to a cluster of inodes being sequential? I don't think you'd be able
> crack enough of it to be useful. You may be able to determine where
> some inodes are relative to others, but I don't think you'd be able to
> point the their location in memory. I don't know anything about crypto,
> so I could be wrong.
>
On a 64-bit kernel, that would be the case. On a 32-bit kernel, there
are no high order bits to chop off, so this would effectively give you
the address.
-- Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)
2006-11-17 15:06 ` Jeff Layton
@ 2006-11-17 15:26 ` Dave Kleikamp
0 siblings, 0 replies; 11+ messages in thread
From: Dave Kleikamp @ 2006-11-17 15:26 UTC (permalink / raw)
To: Jeff Layton; +Cc: Matthew Wilcox, linux-fsdevel
On Fri, 2006-11-17 at 10:06 -0500, Jeff Layton wrote:
> On Fri, 2006-11-17 at 09:01 -0600, Dave Kleikamp wrote:
>
> > Wouldn't you only be able to only crack a few of the low-order bits due
> > to a cluster of inodes being sequential? I don't think you'd be able
> > crack enough of it to be useful. You may be able to determine where
> > some inodes are relative to others, but I don't think you'd be able to
> > point the their location in memory. I don't know anything about crypto,
> > so I could be wrong.
> >
>
> On a 64-bit kernel, that would be the case. On a 32-bit kernel, there
> are no high order bits to chop off, so this would effectively give you
> the address.
By a few of the low-order bits, I mean very few, not 32. The slab
allocator generally allocates one page at a time, so you typically don't
get more than about 6 inodes in a slab page. Even if you consider that
you may be able allocate several slab pages together, it would be hard
to get very many contiguous pages of inode slab cache. Even if it were
possible to force the system to allocate around 256 contiguous inodes,
you would still only be able to determine 8 bits out of 32. At most one
or two more if you could determine a pattern of inode numbers that would
be skipped due to the inclusion of struct inode within the fs-dependent
inode structure. I may be wrong, but I doubt that someone could
determine the entire mask from the observed i_ino's.
Shaggy
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)
2006-11-17 14:21 ` Jeff Layton
@ 2006-11-17 16:31 ` Jeff Layton
0 siblings, 0 replies; 11+ messages in thread
From: Jeff Layton @ 2006-11-17 16:31 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-fsdevel
On Fri, 2006-11-17 at 09:21 -0500, Jeff Layton wrote:
> Because the lower 8-9 bits of the inode pointer aren't significant
> (presuming an inode struct size of ~400-800 bytes). If we take those out
> of the picture then we extend the range of addresses that we can
> uniquely squish into a 32 bit value.
>
> Of course, all of this depends on the idea that the slab allocator grabs
> pages that are somewhat close together in the kernel's address space.
> I'm trying to figure out whether that is the case or not...
It sounds like we can't count on that though. A NUMA system will
apparently map pages *anywhere* within the entire 64-bit address range.
If that's the case, we can't count on the addresses being close enough
together to make this work.
Bummer, I really liked this scheme :-). I suppose I'll have to look at
other options...
-- Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-11-17 16:31 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-17 13:43 [RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion) Jeff Layton
2006-11-17 13:50 ` Matthew Wilcox
2006-11-17 14:14 ` Jörn Engel
2006-11-17 14:24 ` Jeff Layton
2006-11-17 14:21 ` Jeff Layton
2006-11-17 16:31 ` Jeff Layton
2006-11-17 14:24 ` Matthew Wilcox
2006-11-17 14:48 ` Jeff Layton
2006-11-17 15:01 ` Dave Kleikamp
2006-11-17 15:06 ` Jeff Layton
2006-11-17 15:26 ` Dave Kleikamp
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).