* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z [not found] ` <20080325190943.GF2237@fieldses.org> @ 2008-03-25 20:32 ` NeilBrown 2008-03-25 21:24 ` Josef 'Jeff' Sipek 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2008-03-25 20:32 UTC (permalink / raw) To: J. Bruce Fields, xfs Cc: Adam Schrotenboer, Jesper Juhl, Trond Myklebust, lkml, linux-nfs, Thomas Daniel, Frederic Revenu, Jeff Doan On Wed, March 26, 2008 6:09 am, J. Bruce Fields wrote: > On Tue, Mar 25, 2008 at 09:59:58AM -0700, Adam Schrotenboer wrote: >> Adam Schrotenboer wrote: >>> Neil Brown wrote: >>>> On Wednesday March 12, jesper.juhl@gmail.com wrote: >>>>> On 12/03/2008, J. Bruce Fields <bfields@fieldses.org> wrote: >>>>>> What was the exported filesystem? >>>>> XFS >>>> >>>> It's a bit of a long shot, but could you try mounting the XFS file >>>> system with >>>> -o ikeep >>>> >>>> and see if it makes a difference. >>>> >>>> When you have "ikeep", I can find the code that increments the >>>> generation number between different uses of the one inode number. >>>> >>>> When you have "noikeep" (which I think is the default) it doesn't keep >>>> the inode of disk when deleted and so (presumably) needs generate a >>>> random generation number for each use. But I cannot find the code >>>> that does that. I'm probably not looking in the right place, but I >>>> don't think it can hurt to try "-o ikeep". >>>> >>>> NeilBrown >>>> >>> Ok, I've unmounted and remounted with that option enabled >>> (/proc/mounts confirms it's enabled). We'll see what happens. >> >> Well, it's been almost 2 weeks (11 days anyhow) and I am not seeing >> the nfs_update_inode message in the syslogs of any of our compute >> servers. I need to talk to the various people who work with them to >> verify, but it looks like this problem has been resolved. > > That's a workaround, at least, but it's unfortunate if a special mount > option is required to get correct behavior for nfs exports. Is there > anything we can do? > I suggest taking it up with the XFS developers... Dear XFS developers. Adam (and Jesper, though that was some time ago) was having problems with an XFS filesystem that was exported via NFS. The client would occasionally report the message given in the subject line. Examining the NFS code suggested that the most likely explanation was that the generation number used in the file handle was the same every time that the inode number was re-used. Examining the XFS code suggested that when the 'ikeep' mount option was used, the generation number be explicitly incremented for each re-use, while without 'ikeep', no evidence of setting the generation number could be found. Maybe it defaults to zero. Experimental evidence suggests that setting 'ikeep' removes the symptom. Question: Is is possible that without 'ikeep', XFS does not even try to provide unique generation numbers? If this is the case, could it please be fixed. If it is not the case, please help me find the code responsible. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z 2008-03-25 20:32 ` [opensuse] nfs_update_inode: inode X mode changed, Y to Z NeilBrown @ 2008-03-25 21:24 ` Josef 'Jeff' Sipek 2008-03-25 21:38 ` NeilBrown 0 siblings, 1 reply; 9+ messages in thread From: Josef 'Jeff' Sipek @ 2008-03-25 21:24 UTC (permalink / raw) To: NeilBrown Cc: J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl, Trond Myklebust, lkml, linux-nfs, Thomas Daniel, Frederic Revenu, Jeff Doan On Wed, Mar 26, 2008 at 07:32:01AM +1100, NeilBrown wrote: > I suggest taking it up with the XFS developers... > > Dear XFS developers. > Adam (and Jesper, though that was some time ago) was having problems > with an XFS filesystem that was exported via NFS. The client would > occasionally report the message given in the subject line. > Examining the NFS code suggested that the most likely explanation > was that the generation number used in the file handle was the same > every time that the inode number was re-used. > > Examining the XFS code suggested that when the 'ikeep' mount option was > used, the generation number be explicitly incremented for each > re-use, while without 'ikeep', no evidence of setting the generation > number could be found. Maybe it defaults to zero. > > Experimental evidence suggests that setting 'ikeep' removes the symptom. > > Question: Is is possible that without 'ikeep', XFS does not even try > to provide unique generation numbers? If this is the case, could it > please be fixed. If it is not the case, please help me find the code > responsible. Unless you specify the "ikeep" mount option, XFS will remove unused inode clusters. The newly freed blocks can be then used to store data or possibly a new inode cluster. If the blocks get reused for inodes, you'll end up with inodes whose generation numbers regressed. (inode number = f(block number)) Using the "ikeep" mount option causes to _never_ free empty inode clusters. This means that if you create many files and then unlink them, you'll end up with many unused inodes that are still allocated (and taking up disk space) but free to be used by the next creat(2)/mkdir(2)/etc.. This "problem" is inherent to any file system which dynamically allocates inodes. Josef 'Jeff' Sipek. -- Linux, n.: Generous programmers from around the world all join forces to help you shoot yourself in the foot for free. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z 2008-03-25 21:24 ` Josef 'Jeff' Sipek @ 2008-03-25 21:38 ` NeilBrown 2008-03-25 22:13 ` Josef 'Jeff' Sipek 2008-03-26 3:27 ` Timothy Shimmin 0 siblings, 2 replies; 9+ messages in thread From: NeilBrown @ 2008-03-25 21:38 UTC (permalink / raw) To: Josef 'Jeff' Sipek Cc: J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl, Trond Myklebust, lkml, linux-nfs, Thomas Daniel, Frederic Revenu, Jeff Doan On Wed, March 26, 2008 8:24 am, Josef 'Jeff' Sipek wrote: > Unless you specify the "ikeep" mount option, XFS will remove unused inode > clusters. The newly freed blocks can be then used to store data or > possibly > a new inode cluster. If the blocks get reused for inodes, you'll end up > with inodes whose generation numbers regressed. (inode number = f(block > number)) > > Using the "ikeep" mount option causes to _never_ free empty inode > clusters. > This means that if you create many files and then unlink them, you'll end > up > with many unused inodes that are still allocated (and taking up disk > space) > but free to be used by the next creat(2)/mkdir(2)/etc.. > > This "problem" is inherent to any file system which dynamically allocates > inodes. Yes, I understand all that. However you still need to do something about the generation number. It must be set to something. When you allocate an inode that doesn't currently exist on the device, you obviously cannot increment the old value and use that. However you can do a lot better than always using 0. The simplest would be to generate a 'random' number (get_random_bytes). Slightly better would be to generate a random number at boot time and use that, incrementing it each time it is used to set the generation number for an inode. Even better would be store store that 'next generation number' in the superblock so there would be even less risk of the 'random' generation producing repeats. This is what ext3 does. It doesn't dynamically allocate inodes, but it doesn't want to pay the cost of reading an old inode from storage just to see what the generation number is. So it has a number in the superblock which is incremented on each inode allocation and is used as the generation number. Certainly anything would be better than always using the same number. NeilBrown ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z 2008-03-25 21:38 ` NeilBrown @ 2008-03-25 22:13 ` Josef 'Jeff' Sipek 2008-03-25 23:09 ` NeilBrown 2008-03-26 3:37 ` David Chinner 2008-03-26 3:27 ` Timothy Shimmin 1 sibling, 2 replies; 9+ messages in thread From: Josef 'Jeff' Sipek @ 2008-03-25 22:13 UTC (permalink / raw) To: NeilBrown Cc: J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl, Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel, Frederic Revenu, Jeff Doan On Wed, Mar 26, 2008 at 08:38:22AM +1100, NeilBrown wrote: ... > However you still need to do something about the generation number. It > must be set to something. Right. > When you allocate an inode that doesn't currently exist on the device, > you obviously cannot increment the old value and use that. Makes sense. > However you can do a lot better than always using 0. I looked at the code (xfs_ialloc.c:xfs_ialloc_ag_alloc) 290 /* 291 * Set initial values for the inodes in this buffer. 292 */ 293 xfs_biozero(fbuf, 0, ninodes << args.mp->m_sb.sb_inodelog); 294 for (i = 0; i < ninodes; i++) { 295 free = XFS_MAKE_IPTR(args.mp, fbuf, i); 296 free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC); 297 free->di_core.di_version = version; 298 free->di_next_unlinked = cpu_to_be32(NULLAGINO); 299 xfs_ialloc_log_di(tp, fbuf, i, 300 XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED); 301 } xfs_biozero(...) turns into a memset(buf, 0, len), and since the loop that follows doesn't change the generation number, it'll stay 0. > The simplest would be to generate a 'random' number (get_random_bytes). > Slightly better would be to generate a random number at boot time > and use that, incrementing it each time it is used to set the > generation number for an inode. I'm not familiar enough with NFS, do you want something that's monotonically increasing or do you just test for inequality? If it is inequality, why not just use something like the jiffies - that should be unique enough. > Even better would be store store that 'next generation number' in the > superblock so there would be even less risk of the 'random' generation > producing repeats. > This is what ext3 does. It doesn't dynamically allocate inodes, > but it doesn't want to pay the cost of reading an old inode from > storage just to see what the generation number is. So it has > a number in the superblock which is incremented on each inode allocation > and is used as the generation number. Something tells me that the SGI folks might not be all too happy with the in-sb number...XFS tries to be as parallel as possible, and this would cause the counter variable to bounce around their NUMA systems. Perhaps a per-ag variable would be better, but I remember reading that parallelizing updates to some inode count variable (I forget which) in the superblock \cite{dchinner-ols2006} led to a rather big improvement. It's almost morning down under, so I guess we'll get their comments on this soon. Josef 'Jeff' Sipek. -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian W. Kernighan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z 2008-03-25 22:13 ` Josef 'Jeff' Sipek @ 2008-03-25 23:09 ` NeilBrown 2008-03-26 3:37 ` David Chinner 1 sibling, 0 replies; 9+ messages in thread From: NeilBrown @ 2008-03-25 23:09 UTC (permalink / raw) To: Josef 'Jeff' Sipek Cc: J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl, Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel, Frederic Revenu, Jeff Doan On Wed, March 26, 2008 9:13 am, Josef 'Jeff' Sipek wrote: > On Wed, Mar 26, 2008 at 08:38:22AM +1100, NeilBrown wrote: > ... >> However you still need to do something about the generation number. It >> must be set to something. > > Right. > >> When you allocate an inode that doesn't currently exist on the device, >> you obviously cannot increment the old value and use that. > > Makes sense. > >> However you can do a lot better than always using 0. > > I looked at the code (xfs_ialloc.c:xfs_ialloc_ag_alloc) > > 290 /* > 291 * Set initial values for the inodes in this buffer. > 292 */ > 293 xfs_biozero(fbuf, 0, ninodes << > args.mp->m_sb.sb_inodelog); > 294 for (i = 0; i < ninodes; i++) { > 295 free = XFS_MAKE_IPTR(args.mp, fbuf, i); > 296 free->di_core.di_magic = > cpu_to_be16(XFS_DINODE_MAGIC); > 297 free->di_core.di_version = version; > 298 free->di_next_unlinked = > cpu_to_be32(NULLAGINO); > 299 xfs_ialloc_log_di(tp, fbuf, i, > 300 XFS_DI_CORE_BITS | > XFS_DI_NEXT_UNLINKED); > 301 } > > xfs_biozero(...) turns into a memset(buf, 0, len), and since the loop that > follows doesn't change the generation number, it'll stay 0. > >> The simplest would be to generate a 'random' number (get_random_bytes). >> Slightly better would be to generate a random number at boot time >> and use that, incrementing it each time it is used to set the >> generation number for an inode. > > I'm not familiar enough with NFS, do you want something that's > monotonically > increasing or do you just test for inequality? If it is inequality, why > not > just use something like the jiffies - that should be unique enough. > What we need is for the "filehandle" to be stable and unique. By 'stable' I mean that every time I get the filehandle for a particular file, I get the same string of bytes. By 'uniqie' I mean that if I get two filehandles for two different files, they must differ in at least one bit. If a file is deleted and the inode is re-used for a new file, then the old and new files are different and must have different file handles. The filehandle is traditionally generated from the inode number and a generation number, but the filesystem can actually do whatever it likes. xfs does it with xfs_fs_encode_fh(). Certainly you could initialise the i_generation to jiffies in xfs_ialloc_ag_alloc. That would be a suitable fix. get_random_bytes might be better, but the difference probably wouldn't be noticeable. NeilBrown ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z 2008-03-25 22:13 ` Josef 'Jeff' Sipek 2008-03-25 23:09 ` NeilBrown @ 2008-03-26 3:37 ` David Chinner 2008-03-26 5:02 ` David Chinner 1 sibling, 1 reply; 9+ messages in thread From: David Chinner @ 2008-03-26 3:37 UTC (permalink / raw) To: Josef 'Jeff' Sipek Cc: NeilBrown, J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl, Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel, Frederic Revenu, Jeff Doan On Tue, Mar 25, 2008 at 06:13:21PM -0400, Josef 'Jeff' Sipek wrote: > On Wed, Mar 26, 2008 at 08:38:22AM +1100, NeilBrown wrote: > ... > > However you still need to do something about the generation number. It > > must be set to something. ..... > > Even better would be store store that 'next generation number' in the > > superblock so there would be even less risk of the 'random' generation > > producing repeats. > > This is what ext3 does. It doesn't dynamically allocate inodes, > > but it doesn't want to pay the cost of reading an old inode from > > storage just to see what the generation number is. So it has > > a number in the superblock which is incremented on each inode allocation > > and is used as the generation number. > > Something tells me that the SGI folks might not be all too happy with the > in-sb number... ..... > Perhaps a per-ag variable would be better, /me goes back to the bug from last year about stable inode/gen numbers for a HSM. dgc> Right, except the last thing we want is yet more global state needing to dgc> be updated in inode allocation. The best way to do this is a max generation dgc> number per AG (held in the AGI) so that it can be updated at the same time dgc> inodes are freed and not cause additional serialisation. Which was soundly rejected by the HSM folk because it wraps at 4 billion inode create/unlink cycles in an AG rather than per inode. The only thing they were happy with was the old behaviour and so they now mount their filesystems with ikeep. At that point the issue was dropped on the floor; the NFS side of things apparently weren't causing any problems so we didn't consider it urgent to fix.... Given this state of affairs (i.e. HSM using ikeep), I guess we can do anything we want for the noikeep case. I'll cook up a patch that does something similar to ext3 generation numbers for the initial seeding.... > but I remember reading that parallelizing updates > to some inode count variable (I forget which) in the superblock > \cite{dchinner-ols2006} led to a rather big improvement. That was for in memory counters not on disk, and the problem really was free block counts rather than free inode counts. Yes, I converted the inode counters at the same time, but that wasn't the limiting factor. Updates to the on disk superblock, OTOH, are a limiting factor and that was the lazy superblock counter modifications solve.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z 2008-03-26 3:37 ` David Chinner @ 2008-03-26 5:02 ` David Chinner 2008-04-17 19:37 ` Adam Schrotenboer 0 siblings, 1 reply; 9+ messages in thread From: David Chinner @ 2008-03-26 5:02 UTC (permalink / raw) To: David Chinner Cc: Josef 'Jeff' Sipek, NeilBrown, J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl, Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel, Frederic Revenu, Jeff Doan On Wed, Mar 26, 2008 at 02:37:38PM +1100, David Chinner wrote: > Given this state of affairs (i.e. HSM using ikeep), I guess we can do > anything we want for the noikeep case. I'll cook up a patch that does > something similar to ext3 generation numbers for the initial seeding.... Patch below for comments. It passes xfsqa, but there's no userspace support for it yet. 2.6.26 is the likely target for this change. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- Don't initialise new inode generation numbers to zero When we allocation new inode chunks, we initialise the generation numbers to zero. This works fine until we delete a chunk and then reallocate it, resulting in the same inode numbers but with a reset generation count. This can result in inode/generation pairs of different inodes occurring relatively close together. Given that the inode/gen pair makes up the "unique" portion of an NFS filehandle on XFS, this can result in file handles cached on clients being seen on the wire from the server but refer to a different file. This causes .... issues for NFS clients. Hence we need a unique generation number initialisation for each inode to prevent reuse of a small portion of the generation number space. Make this initialiser per-allocation group so that it is not a single point of contention in the filesystem, and increment it on every allocation within an AG to reduce the chance that a generation number is reused for a given inode number if the inode chunk is deleted and reallocated immediately afterwards. It is safe to add the agi_newinogen field to the AGI without using a feature bit. If an older kernel is used, it simply will not update the field on allocation. If the kernel is updated and the field has garbage in it, then it's like having a random seed to the generation number.... Signed-off-by: Dave Chinner <dgc@sgi.com> --- fs/xfs/xfs_ag.h | 4 +++- fs/xfs/xfs_ialloc.c | 30 ++++++++++++++++++++++-------- 2 files changed, 25 insertions(+), 9 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_ag.h =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ag.h 2008-01-18 18:30:06.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_ag.h 2008-03-26 13:03:41.122918236 +1100 @@ -121,6 +121,7 @@ typedef struct xfs_agi { * still being referenced. */ __be32 agi_unlinked[XFS_AGI_UNLINKED_BUCKETS]; + __be32 agi_newinogen; /* inode cluster generation */ } xfs_agi_t; #define XFS_AGI_MAGICNUM 0x00000001 @@ -134,7 +135,8 @@ typedef struct xfs_agi { #define XFS_AGI_NEWINO 0x00000100 #define XFS_AGI_DIRINO 0x00000200 #define XFS_AGI_UNLINKED 0x00000400 -#define XFS_AGI_NUM_BITS 11 +#define XFS_AGI_NEWINOGEN 0x00000800 +#define XFS_AGI_NUM_BITS 12 #define XFS_AGI_ALL_BITS ((1 << XFS_AGI_NUM_BITS) - 1) /* disk block (xfs_daddr_t) in the AG */ Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2008-03-25 15:41:27.000000000 +1100 +++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-03-26 14:29:47.998554368 +1100 @@ -309,6 +309,8 @@ xfs_ialloc_ag_alloc( free = XFS_MAKE_IPTR(args.mp, fbuf, i); free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC); free->di_core.di_version = version; + free->di_core.di_gen = agi->agi_newinogen; + be32_add_cpu(&agi->agi_newinogen, 1); free->di_next_unlinked = cpu_to_be32(NULLAGINO); xfs_ialloc_log_di(tp, fbuf, i, XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED); @@ -347,7 +349,8 @@ xfs_ialloc_ag_alloc( * Log allocation group header fields */ xfs_ialloc_log_agi(tp, agbp, - XFS_AGI_COUNT | XFS_AGI_FREECOUNT | XFS_AGI_NEWINO); + XFS_AGI_COUNT | XFS_AGI_FREECOUNT | + XFS_AGI_NEWINO | XFS_AGI_NEWINOGEN); /* * Modify/log superblock values for inode count and inode free count. */ @@ -896,11 +899,12 @@ nextag: ino = XFS_AGINO_TO_INO(mp, agno, rec.ir_startino + offset); XFS_INOBT_CLR_FREE(&rec, offset); rec.ir_freecount--; + be32_add_cpu(&agi->agi_newinogen, 1); if ((error = xfs_inobt_update(cur, rec.ir_startino, rec.ir_freecount, rec.ir_free))) goto error0; be32_add(&agi->agi_freecount, -1); - xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT); + xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT | XFS_AGI_NEWINOGEN); down_read(&mp->m_peraglock); mp->m_perag[tagno].pagi_freecount--; up_read(&mp->m_peraglock); @@ -1320,6 +1324,11 @@ xfs_ialloc_compute_maxlevels( /* * Log specified fields for the ag hdr (inode section) + * + * We don't log the unlinked inode fields through here; they + * get logged directly to the buffer. Hence we have a discontinuity + * in the fields we are logging and we need two calls to map all + * the dirtied parts of the agi.... */ void xfs_ialloc_log_agi( @@ -1342,22 +1351,27 @@ xfs_ialloc_log_agi( offsetof(xfs_agi_t, agi_newino), offsetof(xfs_agi_t, agi_dirino), offsetof(xfs_agi_t, agi_unlinked), + offsetof(xfs_agi_t, agi_newinogen), sizeof(xfs_agi_t) }; + int log_newino = fields & XFS_AGI_NEWINOGEN; + #ifdef DEBUG xfs_agi_t *agi; /* allocation group header */ agi = XFS_BUF_TO_AGI(bp); ASSERT(be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC); #endif - /* - * Compute byte offsets for the first and last fields. - */ + fields &= ~XFS_AGI_NEWINOGEN; + + /* Compute byte offsets for the first and last fields. */ xfs_btree_offsets(fields, offsets, XFS_AGI_NUM_BITS, &first, &last); - /* - * Log the allocation group inode header buffer. - */ xfs_trans_log_buf(tp, bp, first, last); + if (log_newino) { + xfs_btree_offsets(XFS_AGI_NEWINOGEN, offsets, XFS_AGI_NUM_BITS, + &first, &last); + xfs_trans_log_buf(tp, bp, first, last); + } } /* ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z 2008-03-26 5:02 ` David Chinner @ 2008-04-17 19:37 ` Adam Schrotenboer 0 siblings, 0 replies; 9+ messages in thread From: Adam Schrotenboer @ 2008-04-17 19:37 UTC (permalink / raw) To: David Chinner Cc: Josef 'Jeff' Sipek, NeilBrown, J. Bruce Fields, xfs, Jesper Juhl, Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel, Frederic Revenu, Jeff Doan [-- Attachment #1: Type: text/plain, Size: 566 bytes --] David Chinner wrote: > On Wed, Mar 26, 2008 at 02:37:38PM +1100, David Chinner wrote: > >> Given this state of affairs (i.e. HSM using ikeep), I guess we can do >> anything we want for the noikeep case. I'll cook up a patch that does >> something similar to ext3 generation numbers for the initial seeding.... >> > > Patch below for comments. It passes xfsqa, but there's no userspace > support for it yet. 2.6.26 is the likely target for this change. > 2.6.26 merge window begins now. Has this been pushed yet? Is it in linux-next tree ? [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z 2008-03-25 21:38 ` NeilBrown 2008-03-25 22:13 ` Josef 'Jeff' Sipek @ 2008-03-26 3:27 ` Timothy Shimmin 1 sibling, 0 replies; 9+ messages in thread From: Timothy Shimmin @ 2008-03-26 3:27 UTC (permalink / raw) To: NeilBrown Cc: Josef 'Jeff' Sipek, J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl, Trond Myklebust, lkml, linux-nfs, Thomas Daniel, Frederic Revenu, Jeff Doan Hi Neil, NeilBrown wrote: > On Wed, March 26, 2008 8:24 am, Josef 'Jeff' Sipek wrote: > >> Unless you specify the "ikeep" mount option, XFS will remove unused inode >> clusters. The newly freed blocks can be then used to store data or >> possibly >> a new inode cluster. If the blocks get reused for inodes, you'll end up >> with inodes whose generation numbers regressed. (inode number = f(block >> number)) >> >> Using the "ikeep" mount option causes to _never_ free empty inode >> clusters. >> This means that if you create many files and then unlink them, you'll end >> up >> with many unused inodes that are still allocated (and taking up disk >> space) >> but free to be used by the next creat(2)/mkdir(2)/etc.. >> >> This "problem" is inherent to any file system which dynamically allocates >> inodes. > > Yes, I understand all that. > > However you still need to do something about the generation number. It > must be set to something. > > When you allocate an inode that doesn't currently exist on the device, > you obviously cannot increment the old value and use that. > However you can do a lot better than always using 0. > Yes, this is a known problem. We came across it in about August last year I believe in the context of DMF as it wants to keep persistent file handles with gen#s in them: SGI bug: 969192: Default mount option "noikeep" makes the inode generation number non-persistent I vaguely remember at the time that a number of different schemes were tossed around but in the end we just turned off the ikeep for DMAPI mounted filesystems. I thought we had a bug open to do a real fix but can't see it at the moment. Will look into it and discuss with our group. Cheers, --Tim ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-04-17 19:37 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <47CF157B.1010908@m2000.com>
[not found] ` <18383.24847.381754.517731@notabene.brown>
[not found] ` <47CF62C5.7000908@m2000.com>
[not found] ` <18384.50909.866848.966192@notabene.brown>
[not found] ` <9a8748490803121513w285cd45rb6b26a3d842cac1b@mail.gmail.com>
[not found] ` <20080312221511.GC31632@fieldses.org>
[not found] ` <9a8748490803121516u36395872i70cc88b0439adc74@mail.gmail.com>
[not found] ` <18394.1501.991087.80264@notabene.brown>
[not found] ` <47DAEFD0.9020407@m2000.com>
[not found] ` <47E92F8E.7030504@m2000.com>
[not found] ` <20080325190943.GF2237@fieldses.org>
2008-03-25 20:32 ` [opensuse] nfs_update_inode: inode X mode changed, Y to Z NeilBrown
2008-03-25 21:24 ` Josef 'Jeff' Sipek
2008-03-25 21:38 ` NeilBrown
2008-03-25 22:13 ` Josef 'Jeff' Sipek
2008-03-25 23:09 ` NeilBrown
2008-03-26 3:37 ` David Chinner
2008-03-26 5:02 ` David Chinner
2008-04-17 19:37 ` Adam Schrotenboer
2008-03-26 3:27 ` Timothy Shimmin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox