* [PATCH] Don't initialise new inode generation numbers to zero V2
@ 2008-04-22 1:58 David Chinner
2008-04-22 4:05 ` Greg Banks
0 siblings, 1 reply; 9+ messages in thread
From: David Chinner @ 2008-04-22 1:58 UTC (permalink / raw)
To: xfs-dev; +Cc: xfs-oss, gnb
Don't initialise new inode generation numbers to zero
When we allocation new inode chunks, we initialise the generation
numbers to zero. This works fine until we delete a chunk and then
reallocate it, resulting in the same inode numbers but with a
reset generation count. This can result in inode/generation
pairs of different inodes occurring relatively close together.
Given that the inode/gen pair makes up the "unique" portion of
an NFS filehandle on XFS, this can result in file handles cached
on clients being seen on the wire from the server but refer to
a different file. This causes .... issues for NFS clients.
Hence we need a unique generation number initialisation for
each inode to prevent reuse of a small portion of the generation
number space. Make this initialiser per-allocation group so
that it is not a single point of contention in the filesystem,
and increment it on every allocation within an AG to reduce the
chance that a generation number is reused for a given inode number
if the inode chunk is deleted and reallocated immediately
afterwards.
Version 2:
o remove persistent per-AGI agi_newinogen field and replace with
randomly generated 32 bit number for each new cluster. This prevents
NFS clients from potentially guessing what the next generation
number is going to be.
Signed-off-by: Dave Chinner <dgc@sgi.com>
---
drivers/char/random.c | 1 +
fs/xfs/xfs_ialloc.c | 10 ++++++++++
2 files changed, 11 insertions(+)
Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2008-04-21 09:48:39.279043874 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-04-21 10:14:07.242106131 +1000
@@ -147,6 +147,7 @@ xfs_ialloc_ag_alloc(
int version; /* inode version number to use */
int isaligned = 0; /* inode allocation at stripe unit */
/* boundary */
+ unsigned int gen;
args.tp = tp;
args.mp = tp->t_mountp;
@@ -290,6 +291,14 @@ xfs_ialloc_ag_alloc(
else
version = XFS_DINODE_VERSION_1;
+ /*
+ * Seed the new inode cluster with a random generation number. This
+ * prevents short-term reuse of generation numbers if a chunk is
+ * freed and then immediately reallocated. We use random numbers
+ * rather than a linear progression to prevent the next generation
+ * number from being guessable.
+ */
+ gen = get_random_int();
for (j = 0; j < nbufs; j++) {
/*
* Get the block.
@@ -309,6 +318,7 @@ xfs_ialloc_ag_alloc(
free = XFS_MAKE_IPTR(args.mp, fbuf, i);
free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
free->di_core.di_version = version;
+ free->di_core.di_gen = cpu_to_be32(gen);
free->di_next_unlinked = cpu_to_be32(NULLAGINO);
xfs_ialloc_log_di(tp, fbuf, i,
XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED);
Index: 2.6.x-xfs-new/drivers/char/random.c
===================================================================
--- 2.6.x-xfs-new.orig/drivers/char/random.c 2008-03-13 13:05:54.000000000 +1100
+++ 2.6.x-xfs-new/drivers/char/random.c 2008-04-21 10:12:18.464202803 +1000
@@ -1646,6 +1646,7 @@ unsigned int get_random_int(void)
*/
return secure_ip_id((__force __be32)(current->pid + jiffies));
}
+EXPORT_SYMBOL(get_random_int);
/*
* randomize_range() returns a start address such that
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] Don't initialise new inode generation numbers to zero V2 2008-04-22 1:58 [PATCH] Don't initialise new inode generation numbers to zero V2 David Chinner @ 2008-04-22 4:05 ` Greg Banks 2008-04-22 5:04 ` David Chinner 0 siblings, 1 reply; 9+ messages in thread From: Greg Banks @ 2008-04-22 4:05 UTC (permalink / raw) To: David Chinner; +Cc: xfs-dev, xfs-oss David Chinner wrote: > Don't initialise new inode generation numbers to zero > > When we allocation new inode chunks, we initialise the generation > numbers to zero. This works fine until we delete a chunk and then > reallocate it, resulting in the same inode numbers but with a > reset generation count. This can result in inode/generation > pairs of different inodes occurring relatively close together. > > Given that the inode/gen pair makes up the "unique" portion of > an NFS filehandle on XFS, this can result in file handles cached > on clients being seen on the wire from the server but refer to > a different file. This causes .... issues for NFS clients. > > Hence we need a unique generation number initialisation for > each inode to prevent reuse of a small portion of the generation > number space. Make this initialiser per-allocation group so > that it is not a single point of contention in the filesystem, > and increment it on every allocation within an AG to reduce the > chance that a generation number is reused for a given inode number > if the inode chunk is deleted and reallocated immediately > afterwards. > > Version 2: > o remove persistent per-AGI agi_newinogen field and replace with > randomly generated 32 bit number for each new cluster. This prevents > NFS clients from potentially guessing what the next generation > number is going to be. > I'm confused, why would an NFS client be trying to guess the generation number? AFAICS the important thing is to ensure that the (inode,gen) tuple isn't reused for a long time to prevent accidental filehandle identity issues on clients; whether the gen is predictable or not doesn't matter at all. You could just as well use a 32 bit hash of the nanosecond-precision timestamp when the inode cluster is first allocated. This patch seems to me to be draining entropy from the random bit pool, which is needed for important things on the network, for little benefit. -- Greg Banks, P.Engineer, SGI Australian Software Group. The cake is *not* a lie. I don't speak for SGI. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Don't initialise new inode generation numbers to zero V2 2008-04-22 4:05 ` Greg Banks @ 2008-04-22 5:04 ` David Chinner 2008-04-25 8:57 ` Christoph Hellwig 0 siblings, 1 reply; 9+ messages in thread From: David Chinner @ 2008-04-22 5:04 UTC (permalink / raw) To: Greg Banks; +Cc: David Chinner, xfs-dev, xfs-oss On Tue, Apr 22, 2008 at 02:05:47PM +1000, Greg Banks wrote: > David Chinner wrote: > > Don't initialise new inode generation numbers to zero > > > > When we allocation new inode chunks, we initialise the generation > > numbers to zero. This works fine until we delete a chunk and then > > reallocate it, resulting in the same inode numbers but with a > > reset generation count. This can result in inode/generation > > pairs of different inodes occurring relatively close together. > > > > Given that the inode/gen pair makes up the "unique" portion of > > an NFS filehandle on XFS, this can result in file handles cached > > on clients being seen on the wire from the server but refer to > > a different file. This causes .... issues for NFS clients. > > > > Hence we need a unique generation number initialisation for > > each inode to prevent reuse of a small portion of the generation > > number space. Make this initialiser per-allocation group so > > that it is not a single point of contention in the filesystem, > > and increment it on every allocation within an AG to reduce the > > chance that a generation number is reused for a given inode number > > if the inode chunk is deleted and reallocated immediately > > afterwards. > > > > Version 2: > > o remove persistent per-AGI agi_newinogen field and replace with > > randomly generated 32 bit number for each new cluster. This prevents > > NFS clients from potentially guessing what the next generation > > number is going to be. > > > I'm confused, why would an NFS client be trying to guess the generation > number? AFAICS the important thing is to ensure that the (inode,gen) > tuple isn't reused for a long time to prevent accidental filehandle > identity issues on clients; whether the gen is predictable or not > doesn't matter at all. Yeah, that's exactly what I said to Christoph, but that's the issue he raised w.r.t a malicious client triggering inode/gen collisions intentionally. If that's not a problem, then I can just use random32() for the inode number. If it is a real problem, then it needs to be a cryptographically secure random number. Personally, I don't care either way - I just want to get the issue fixed. Christoph, care to explain how and why this is a problem to everyone? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Don't initialise new inode generation numbers to zero V2 2008-04-22 5:04 ` David Chinner @ 2008-04-25 8:57 ` Christoph Hellwig 2008-04-28 3:11 ` David Chinner 2008-04-28 3:24 ` Greg Banks 0 siblings, 2 replies; 9+ messages in thread From: Christoph Hellwig @ 2008-04-25 8:57 UTC (permalink / raw) To: David Chinner; +Cc: Greg Banks, xfs-dev, xfs-oss On Tue, Apr 22, 2008 at 03:04:48PM +1000, David Chinner wrote: > > I'm confused, why would an NFS client be trying to guess the generation > > number? AFAICS the important thing is to ensure that the (inode,gen) > > tuple isn't reused for a long time to prevent accidental filehandle > > identity issues on clients; whether the gen is predictable or not > > doesn't matter at all. > > Yeah, that's exactly what I said to Christoph, but that's the issue he > raised w.r.t a malicious client triggering inode/gen collisions > intentionally. If that's not a problem, then I can just use random32() > for the inode number. If it is a real problem, then it needs to be > a cryptographically secure random number. Personally, I don't care > either way - I just want to get the issue fixed. > > Christoph, care to explain how and why this is a problem to everyone? XFS has some heuristics for inode placement and of course for removing the inode cluster and re-allocting it. I have a gut feeling that there is a small chance to trigger a re-use via nfs operations. Making the initial generation number random means we remove one of the major user-triggerable inputs from the equation. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Don't initialise new inode generation numbers to zero V2 2008-04-25 8:57 ` Christoph Hellwig @ 2008-04-28 3:11 ` David Chinner 2008-04-28 5:59 ` Christoph Hellwig 2008-04-28 3:24 ` Greg Banks 1 sibling, 1 reply; 9+ messages in thread From: David Chinner @ 2008-04-28 3:11 UTC (permalink / raw) To: Christoph Hellwig; +Cc: David Chinner, Greg Banks, xfs-dev, xfs-oss On Fri, Apr 25, 2008 at 04:57:50AM -0400, Christoph Hellwig wrote: > On Tue, Apr 22, 2008 at 03:04:48PM +1000, David Chinner wrote: > > > I'm confused, why would an NFS client be trying to guess the generation > > > number? AFAICS the important thing is to ensure that the (inode,gen) > > > tuple isn't reused for a long time to prevent accidental filehandle > > > identity issues on clients; whether the gen is predictable or not > > > doesn't matter at all. > > > > Yeah, that's exactly what I said to Christoph, but that's the issue he > > raised w.r.t a malicious client triggering inode/gen collisions > > intentionally. If that's not a problem, then I can just use random32() > > for the inode number. If it is a real problem, then it needs to be > > a cryptographically secure random number. Personally, I don't care > > either way - I just want to get the issue fixed. > > > > Christoph, care to explain how and why this is a problem to everyone? > > XFS has some heuristics for inode placement and of course for removing > the inode cluster and re-allocting it. I have a gut feeling that there > is a small chance to trigger a re-use via nfs operations. Making the > initial generation number random means we remove one of the major > user-triggerable inputs from the equation. Ok, so is random32() good enough or does it need to be get_random_int()? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Don't initialise new inode generation numbers to zero V2 2008-04-28 3:11 ` David Chinner @ 2008-04-28 5:59 ` Christoph Hellwig 2008-04-28 6:20 ` David Chinner 0 siblings, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2008-04-28 5:59 UTC (permalink / raw) To: David Chinner; +Cc: Christoph Hellwig, Greg Banks, xfs-dev, xfs-oss On Mon, Apr 28, 2008 at 01:11:20PM +1000, David Chinner wrote: > Ok, so is random32() good enough or does it need to be get_random_int()? I'm not sure offhand what random32 actually is, but a not cryptographically strong random number should be enough. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Don't initialise new inode generation numbers to zero V2 2008-04-28 5:59 ` Christoph Hellwig @ 2008-04-28 6:20 ` David Chinner 2008-04-28 6:25 ` Christoph Hellwig 0 siblings, 1 reply; 9+ messages in thread From: David Chinner @ 2008-04-28 6:20 UTC (permalink / raw) To: Christoph Hellwig; +Cc: David Chinner, Greg Banks, xfs-dev, xfs-oss On Mon, Apr 28, 2008 at 01:59:22AM -0400, Christoph Hellwig wrote: > On Mon, Apr 28, 2008 at 01:11:20PM +1000, David Chinner wrote: > > Ok, so is random32() good enough or does it need to be get_random_int()? > > I'm not sure offhand what random32 actually is, but a not > cryptographically strong random number should be enough. >From lib/random32.c: This is a maximally equidistributed combined Tausworthe generator based on code from GNU Scientific Library 1.5 (30 Jun 2004) x_n = (s1_n ^ s2_n ^ s3_n) s1_{n+1} = (((s1_n & 4294967294) <<12) ^ (((s1_n <<13) ^ s1_n) >>19)) s2_{n+1} = (((s2_n & 4294967288) << 4) ^ (((s2_n << 2) ^ s2_n) >>25)) s3_{n+1} = (((s3_n & 4294967280) <<17) ^ (((s3_n << 3) ^ s3_n) >>11)) The period of this generator is about 2^88. From: P. L'Ecuyer, "Maximally Equidistributed Combined Tausworthe Generators", Mathematics of Computation, 65, 213 (1996), 203--213. ----- And uses per-cpu state to for the next number. Should be good enough. Patch below. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- Don't initialise new inode generation numbers to zero When we allocation new inode chunks, we initialise the generation numbers to zero. This works fine until we delete a chunk and then reallocate it, resulting in the same inode numbers but with a reset generation count. This can result in inode/generation pairs of different inodes occurring relatively close together. Given that the inode/gen pair makes up the "unique" portion of an NFS filehandle on XFS, this can result in file handles cached on clients being seen on the wire from the server but refer to a different file. This causes .... issues for NFS clients. Hence we need a unique generation number initialisation for each inode to prevent reuse of a small portion of the generation number space. Make this initialiser per-allocation group so that it is not a single point of contention in the filesystem, and increment it on every allocation within an AG to reduce the chance that a generation number is reused for a given inode number if the inode chunk is deleted and reallocated immediately afterwards. Version 3: o use random32 rather than get_random_int() as cryptographically secure random numbers are not really necessary here. Version 2: o remove persistent per-AGI agi_newinogen field and replace with randomly generated 32 bit number for each new cluster. This prevents NFS clients from potentially guessing what the next generation number is going to be and removes the need for persistent numbers on disk. Signed-off-by: Dave Chinner <dgc@sgi.com> --- fs/xfs/xfs_ialloc.c | 10 ++++++++++ 1 file changed, 10 insertions(+) Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2008-04-28 16:12:57.376445802 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-04-28 16:15:04.427919630 +1000 @@ -147,6 +147,7 @@ xfs_ialloc_ag_alloc( int version; /* inode version number to use */ int isaligned = 0; /* inode allocation at stripe unit */ /* boundary */ + unsigned int gen; args.tp = tp; args.mp = tp->t_mountp; @@ -290,6 +291,14 @@ xfs_ialloc_ag_alloc( else version = XFS_DINODE_VERSION_1; + /* + * Seed the new inode cluster with a random generation number. This + * prevents short-term reuse of generation numbers if a chunk is + * freed and then immediately reallocated. We use random numbers + * rather than a linear progression to prevent the next generation + * number from easily guessable. + */ + gen = random32(); for (j = 0; j < nbufs; j++) { /* * Get the block. @@ -309,6 +318,7 @@ xfs_ialloc_ag_alloc( free = XFS_MAKE_IPTR(args.mp, fbuf, i); free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC); free->di_core.di_version = version; + free->di_core.di_gen = cpu_to_be32(gen); free->di_next_unlinked = cpu_to_be32(NULLAGINO); xfs_ialloc_log_di(tp, fbuf, i, XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Don't initialise new inode generation numbers to zero V2 2008-04-28 6:20 ` David Chinner @ 2008-04-28 6:25 ` Christoph Hellwig 0 siblings, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2008-04-28 6:25 UTC (permalink / raw) To: David Chinner; +Cc: Christoph Hellwig, Greg Banks, xfs-dev, xfs-oss Looks good. On Mon, Apr 28, 2008 at 04:20:32PM +1000, David Chinner wrote: > Don't initialise new inode generation numbers to zero > > When we allocation new inode chunks, we initialise the generation > numbers to zero. This works fine until we delete a chunk and then > reallocate it, resulting in the same inode numbers but with a > reset generation count. This can result in inode/generation > pairs of different inodes occurring relatively close together. > > Given that the inode/gen pair makes up the "unique" portion of > an NFS filehandle on XFS, this can result in file handles cached > on clients being seen on the wire from the server but refer to > a different file. This causes .... issues for NFS clients. > > Hence we need a unique generation number initialisation for > each inode to prevent reuse of a small portion of the generation > number space. Make this initialiser per-allocation group so > that it is not a single point of contention in the filesystem, > and increment it on every allocation within an AG to reduce the > chance that a generation number is reused for a given inode number > if the inode chunk is deleted and reallocated immediately > afterwards. > > Version 3: > o use random32 rather than get_random_int() as cryptographically > secure random numbers are not really necessary here. > > Version 2: > o remove persistent per-AGI agi_newinogen field and replace with > randomly generated 32 bit number for each new cluster. This prevents > NFS clients from potentially guessing what the next generation > number is going to be and removes the need for persistent numbers on > disk. > > Signed-off-by: Dave Chinner <dgc@sgi.com> > --- > fs/xfs/xfs_ialloc.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c > =================================================================== > --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2008-04-28 16:12:57.376445802 +1000 > +++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-04-28 16:15:04.427919630 +1000 > @@ -147,6 +147,7 @@ xfs_ialloc_ag_alloc( > int version; /* inode version number to use */ > int isaligned = 0; /* inode allocation at stripe unit */ > /* boundary */ > + unsigned int gen; > > args.tp = tp; > args.mp = tp->t_mountp; > @@ -290,6 +291,14 @@ xfs_ialloc_ag_alloc( > else > version = XFS_DINODE_VERSION_1; > > + /* > + * Seed the new inode cluster with a random generation number. This > + * prevents short-term reuse of generation numbers if a chunk is > + * freed and then immediately reallocated. We use random numbers > + * rather than a linear progression to prevent the next generation > + * number from easily guessable. > + */ > + gen = random32(); > for (j = 0; j < nbufs; j++) { > /* > * Get the block. > @@ -309,6 +318,7 @@ xfs_ialloc_ag_alloc( > free = XFS_MAKE_IPTR(args.mp, fbuf, i); > free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC); > free->di_core.di_version = version; > + free->di_core.di_gen = cpu_to_be32(gen); > free->di_next_unlinked = cpu_to_be32(NULLAGINO); > xfs_ialloc_log_di(tp, fbuf, i, > XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED); ---end quoted text--- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Don't initialise new inode generation numbers to zero V2 2008-04-25 8:57 ` Christoph Hellwig 2008-04-28 3:11 ` David Chinner @ 2008-04-28 3:24 ` Greg Banks 1 sibling, 0 replies; 9+ messages in thread From: Greg Banks @ 2008-04-28 3:24 UTC (permalink / raw) To: Christoph Hellwig; +Cc: David Chinner, xfs-dev, xfs-oss Christoph Hellwig wrote: > On Tue, Apr 22, 2008 at 03:04:48PM +1000, David Chinner wrote: > >> >> Christoph, care to explain how and why this is a problem to everyone? >> > > XFS has some heuristics for inode placement and of course for removing > the inode cluster and re-allocting it. I have a gut feeling that there > is a small chance to trigger a re-use via nfs operations. Dave's initial patch -- please correct me if I misunderstood it -- kept a per-AG counter which was used to initialise the generation number of new inodes. This counter incremented every time an inode was created in that AG, i.e. every mknod() mkdir() or creat() bumps the counter for the appropriate AG. So it served as a per-AG inode generation number, which given the fixed mapping between AG and inode# means it's just like a per-inode generation number but sparser. Thus the only way you could cause a re-use of an (inode,gen) tuple would be by cycling through 2^32-1 inode creations anywhere in the same AG before destroying and re-creating the original inode. That's as good a protection as 32bits is going to buy. So if Dave's first patch works the way I think it works, it should be just fine for NFS' purposes. Am I missing something? -- Greg Banks, P.Engineer, SGI Australian Software Group. The cake is *not* a lie. I don't speak for SGI. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-04-28 6:25 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-04-22 1:58 [PATCH] Don't initialise new inode generation numbers to zero V2 David Chinner 2008-04-22 4:05 ` Greg Banks 2008-04-22 5:04 ` David Chinner 2008-04-25 8:57 ` Christoph Hellwig 2008-04-28 3:11 ` David Chinner 2008-04-28 5:59 ` Christoph Hellwig 2008-04-28 6:20 ` David Chinner 2008-04-28 6:25 ` Christoph Hellwig 2008-04-28 3:24 ` Greg Banks
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox