* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
[not found] ` <20080325190943.GF2237@fieldses.org>
@ 2008-03-25 20:32 ` NeilBrown
2008-03-25 21:24 ` Josef 'Jeff' Sipek
0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2008-03-25 20:32 UTC (permalink / raw)
To: J. Bruce Fields, xfs
Cc: Adam Schrotenboer, Jesper Juhl, Trond Myklebust, lkml, linux-nfs,
Thomas Daniel, Frederic Revenu, Jeff Doan
On Wed, March 26, 2008 6:09 am, J. Bruce Fields wrote:
> On Tue, Mar 25, 2008 at 09:59:58AM -0700, Adam Schrotenboer wrote:
>> Adam Schrotenboer wrote:
>>> Neil Brown wrote:
>>>> On Wednesday March 12, jesper.juhl@gmail.com wrote:
>>>>> On 12/03/2008, J. Bruce Fields <bfields@fieldses.org> wrote:
>>>>>> What was the exported filesystem?
>>>>> XFS
>>>>
>>>> It's a bit of a long shot, but could you try mounting the XFS file
>>>> system with
>>>> -o ikeep
>>>>
>>>> and see if it makes a difference.
>>>>
>>>> When you have "ikeep", I can find the code that increments the
>>>> generation number between different uses of the one inode number.
>>>>
>>>> When you have "noikeep" (which I think is the default) it doesn't keep
>>>> the inode of disk when deleted and so (presumably) needs generate a
>>>> random generation number for each use. But I cannot find the code
>>>> that does that. I'm probably not looking in the right place, but I
>>>> don't think it can hurt to try "-o ikeep".
>>>>
>>>> NeilBrown
>>>>
>>> Ok, I've unmounted and remounted with that option enabled
>>> (/proc/mounts confirms it's enabled). We'll see what happens.
>>
>> Well, it's been almost 2 weeks (11 days anyhow) and I am not seeing
>> the nfs_update_inode message in the syslogs of any of our compute
>> servers. I need to talk to the various people who work with them to
>> verify, but it looks like this problem has been resolved.
>
> That's a workaround, at least, but it's unfortunate if a special mount
> option is required to get correct behavior for nfs exports. Is there
> anything we can do?
>
I suggest taking it up with the XFS developers...
Dear XFS developers.
Adam (and Jesper, though that was some time ago) was having problems
with an XFS filesystem that was exported via NFS. The client would
occasionally report the message given in the subject line.
Examining the NFS code suggested that the most likely explanation
was that the generation number used in the file handle was the same
every time that the inode number was re-used.
Examining the XFS code suggested that when the 'ikeep' mount option was
used, the generation number be explicitly incremented for each
re-use, while without 'ikeep', no evidence of setting the generation
number could be found. Maybe it defaults to zero.
Experimental evidence suggests that setting 'ikeep' removes the symptom.
Question: Is is possible that without 'ikeep', XFS does not even try
to provide unique generation numbers? If this is the case, could it
please be fixed. If it is not the case, please help me find the code
responsible.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
2008-03-25 20:32 ` [opensuse] nfs_update_inode: inode X mode changed, Y to Z NeilBrown
@ 2008-03-25 21:24 ` Josef 'Jeff' Sipek
2008-03-25 21:38 ` NeilBrown
0 siblings, 1 reply; 9+ messages in thread
From: Josef 'Jeff' Sipek @ 2008-03-25 21:24 UTC (permalink / raw)
To: NeilBrown
Cc: J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl,
Trond Myklebust, lkml, linux-nfs, Thomas Daniel, Frederic Revenu,
Jeff Doan
On Wed, Mar 26, 2008 at 07:32:01AM +1100, NeilBrown wrote:
> I suggest taking it up with the XFS developers...
>
> Dear XFS developers.
> Adam (and Jesper, though that was some time ago) was having problems
> with an XFS filesystem that was exported via NFS. The client would
> occasionally report the message given in the subject line.
> Examining the NFS code suggested that the most likely explanation
> was that the generation number used in the file handle was the same
> every time that the inode number was re-used.
>
> Examining the XFS code suggested that when the 'ikeep' mount option was
> used, the generation number be explicitly incremented for each
> re-use, while without 'ikeep', no evidence of setting the generation
> number could be found. Maybe it defaults to zero.
>
> Experimental evidence suggests that setting 'ikeep' removes the symptom.
>
> Question: Is is possible that without 'ikeep', XFS does not even try
> to provide unique generation numbers? If this is the case, could it
> please be fixed. If it is not the case, please help me find the code
> responsible.
Unless you specify the "ikeep" mount option, XFS will remove unused inode
clusters. The newly freed blocks can be then used to store data or possibly
a new inode cluster. If the blocks get reused for inodes, you'll end up
with inodes whose generation numbers regressed. (inode number = f(block
number))
Using the "ikeep" mount option causes to _never_ free empty inode clusters.
This means that if you create many files and then unlink them, you'll end up
with many unused inodes that are still allocated (and taking up disk space)
but free to be used by the next creat(2)/mkdir(2)/etc..
This "problem" is inherent to any file system which dynamically allocates
inodes.
Josef 'Jeff' Sipek.
--
Linux, n.:
Generous programmers from around the world all join forces to help
you shoot yourself in the foot for free.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
2008-03-25 21:24 ` Josef 'Jeff' Sipek
@ 2008-03-25 21:38 ` NeilBrown
2008-03-25 22:13 ` Josef 'Jeff' Sipek
2008-03-26 3:27 ` Timothy Shimmin
0 siblings, 2 replies; 9+ messages in thread
From: NeilBrown @ 2008-03-25 21:38 UTC (permalink / raw)
To: Josef 'Jeff' Sipek
Cc: J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl,
Trond Myklebust, lkml, linux-nfs, Thomas Daniel, Frederic Revenu,
Jeff Doan
On Wed, March 26, 2008 8:24 am, Josef 'Jeff' Sipek wrote:
> Unless you specify the "ikeep" mount option, XFS will remove unused inode
> clusters. The newly freed blocks can be then used to store data or
> possibly
> a new inode cluster. If the blocks get reused for inodes, you'll end up
> with inodes whose generation numbers regressed. (inode number = f(block
> number))
>
> Using the "ikeep" mount option causes to _never_ free empty inode
> clusters.
> This means that if you create many files and then unlink them, you'll end
> up
> with many unused inodes that are still allocated (and taking up disk
> space)
> but free to be used by the next creat(2)/mkdir(2)/etc..
>
> This "problem" is inherent to any file system which dynamically allocates
> inodes.
Yes, I understand all that.
However you still need to do something about the generation number. It
must be set to something.
When you allocate an inode that doesn't currently exist on the device,
you obviously cannot increment the old value and use that.
However you can do a lot better than always using 0.
The simplest would be to generate a 'random' number (get_random_bytes).
Slightly better would be to generate a random number at boot time
and use that, incrementing it each time it is used to set the
generation number for an inode.
Even better would be store store that 'next generation number' in the
superblock so there would be even less risk of the 'random' generation
producing repeats.
This is what ext3 does. It doesn't dynamically allocate inodes,
but it doesn't want to pay the cost of reading an old inode from
storage just to see what the generation number is. So it has
a number in the superblock which is incremented on each inode allocation
and is used as the generation number.
Certainly anything would be better than always using the same number.
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
2008-03-25 21:38 ` NeilBrown
@ 2008-03-25 22:13 ` Josef 'Jeff' Sipek
2008-03-25 23:09 ` NeilBrown
2008-03-26 3:37 ` David Chinner
2008-03-26 3:27 ` Timothy Shimmin
1 sibling, 2 replies; 9+ messages in thread
From: Josef 'Jeff' Sipek @ 2008-03-25 22:13 UTC (permalink / raw)
To: NeilBrown
Cc: J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl,
Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel,
Frederic Revenu, Jeff Doan
On Wed, Mar 26, 2008 at 08:38:22AM +1100, NeilBrown wrote:
...
> However you still need to do something about the generation number. It
> must be set to something.
Right.
> When you allocate an inode that doesn't currently exist on the device,
> you obviously cannot increment the old value and use that.
Makes sense.
> However you can do a lot better than always using 0.
I looked at the code (xfs_ialloc.c:xfs_ialloc_ag_alloc)
290 /*
291 * Set initial values for the inodes in this buffer.
292 */
293 xfs_biozero(fbuf, 0, ninodes << args.mp->m_sb.sb_inodelog);
294 for (i = 0; i < ninodes; i++) {
295 free = XFS_MAKE_IPTR(args.mp, fbuf, i);
296 free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
297 free->di_core.di_version = version;
298 free->di_next_unlinked = cpu_to_be32(NULLAGINO);
299 xfs_ialloc_log_di(tp, fbuf, i,
300 XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED);
301 }
xfs_biozero(...) turns into a memset(buf, 0, len), and since the loop that
follows doesn't change the generation number, it'll stay 0.
> The simplest would be to generate a 'random' number (get_random_bytes).
> Slightly better would be to generate a random number at boot time
> and use that, incrementing it each time it is used to set the
> generation number for an inode.
I'm not familiar enough with NFS, do you want something that's monotonically
increasing or do you just test for inequality? If it is inequality, why not
just use something like the jiffies - that should be unique enough.
> Even better would be store store that 'next generation number' in the
> superblock so there would be even less risk of the 'random' generation
> producing repeats.
> This is what ext3 does. It doesn't dynamically allocate inodes,
> but it doesn't want to pay the cost of reading an old inode from
> storage just to see what the generation number is. So it has
> a number in the superblock which is incremented on each inode allocation
> and is used as the generation number.
Something tells me that the SGI folks might not be all too happy with the
in-sb number...XFS tries to be as parallel as possible, and this would cause
the counter variable to bounce around their NUMA systems. Perhaps a per-ag
variable would be better, but I remember reading that parallelizing updates
to some inode count variable (I forget which) in the superblock
\cite{dchinner-ols2006} led to a rather big improvement. It's almost
morning down under, so I guess we'll get their comments on this soon.
Josef 'Jeff' Sipek.
--
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.
- Brian W. Kernighan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
2008-03-25 22:13 ` Josef 'Jeff' Sipek
@ 2008-03-25 23:09 ` NeilBrown
2008-03-26 3:37 ` David Chinner
1 sibling, 0 replies; 9+ messages in thread
From: NeilBrown @ 2008-03-25 23:09 UTC (permalink / raw)
To: Josef 'Jeff' Sipek
Cc: J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl,
Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel,
Frederic Revenu, Jeff Doan
On Wed, March 26, 2008 9:13 am, Josef 'Jeff' Sipek wrote:
> On Wed, Mar 26, 2008 at 08:38:22AM +1100, NeilBrown wrote:
> ...
>> However you still need to do something about the generation number. It
>> must be set to something.
>
> Right.
>
>> When you allocate an inode that doesn't currently exist on the device,
>> you obviously cannot increment the old value and use that.
>
> Makes sense.
>
>> However you can do a lot better than always using 0.
>
> I looked at the code (xfs_ialloc.c:xfs_ialloc_ag_alloc)
>
> 290 /*
> 291 * Set initial values for the inodes in this buffer.
> 292 */
> 293 xfs_biozero(fbuf, 0, ninodes <<
> args.mp->m_sb.sb_inodelog);
> 294 for (i = 0; i < ninodes; i++) {
> 295 free = XFS_MAKE_IPTR(args.mp, fbuf, i);
> 296 free->di_core.di_magic =
> cpu_to_be16(XFS_DINODE_MAGIC);
> 297 free->di_core.di_version = version;
> 298 free->di_next_unlinked =
> cpu_to_be32(NULLAGINO);
> 299 xfs_ialloc_log_di(tp, fbuf, i,
> 300 XFS_DI_CORE_BITS |
> XFS_DI_NEXT_UNLINKED);
> 301 }
>
> xfs_biozero(...) turns into a memset(buf, 0, len), and since the loop that
> follows doesn't change the generation number, it'll stay 0.
>
>> The simplest would be to generate a 'random' number (get_random_bytes).
>> Slightly better would be to generate a random number at boot time
>> and use that, incrementing it each time it is used to set the
>> generation number for an inode.
>
> I'm not familiar enough with NFS, do you want something that's
> monotonically
> increasing or do you just test for inequality? If it is inequality, why
> not
> just use something like the jiffies - that should be unique enough.
>
What we need is for the "filehandle" to be stable and unique.
By 'stable' I mean that every time I get the filehandle for a particular
file, I get the same string of bytes.
By 'uniqie' I mean that if I get two filehandles for two different
files, they must differ in at least one bit.
If a file is deleted and the inode is re-used for a new file, then the
old and new files are different and must have different file handles.
The filehandle is traditionally generated from the inode number and
a generation number, but the filesystem can actually do whatever it
likes. xfs does it with xfs_fs_encode_fh().
Certainly you could initialise the i_generation to jiffies in
xfs_ialloc_ag_alloc. That would be a suitable fix. get_random_bytes
might be better, but the difference probably wouldn't be noticeable.
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
2008-03-25 21:38 ` NeilBrown
2008-03-25 22:13 ` Josef 'Jeff' Sipek
@ 2008-03-26 3:27 ` Timothy Shimmin
1 sibling, 0 replies; 9+ messages in thread
From: Timothy Shimmin @ 2008-03-26 3:27 UTC (permalink / raw)
To: NeilBrown
Cc: Josef 'Jeff' Sipek, J. Bruce Fields, xfs,
Adam Schrotenboer, Jesper Juhl, Trond Myklebust, lkml, linux-nfs,
Thomas Daniel, Frederic Revenu, Jeff Doan
Hi Neil,
NeilBrown wrote:
> On Wed, March 26, 2008 8:24 am, Josef 'Jeff' Sipek wrote:
>
>> Unless you specify the "ikeep" mount option, XFS will remove unused inode
>> clusters. The newly freed blocks can be then used to store data or
>> possibly
>> a new inode cluster. If the blocks get reused for inodes, you'll end up
>> with inodes whose generation numbers regressed. (inode number = f(block
>> number))
>>
>> Using the "ikeep" mount option causes to _never_ free empty inode
>> clusters.
>> This means that if you create many files and then unlink them, you'll end
>> up
>> with many unused inodes that are still allocated (and taking up disk
>> space)
>> but free to be used by the next creat(2)/mkdir(2)/etc..
>>
>> This "problem" is inherent to any file system which dynamically allocates
>> inodes.
>
> Yes, I understand all that.
>
> However you still need to do something about the generation number. It
> must be set to something.
>
> When you allocate an inode that doesn't currently exist on the device,
> you obviously cannot increment the old value and use that.
> However you can do a lot better than always using 0.
>
Yes, this is a known problem.
We came across it in about August last year I believe in the context of
DMF as it wants to keep persistent file handles with gen#s in them:
SGI bug:
969192: Default mount option "noikeep" makes the inode generation number non-persistent
I vaguely remember at the time that a number of different schemes were
tossed around but in the end we just turned off the ikeep
for DMAPI mounted filesystems.
I thought we had a bug open to do a real fix but can't see
it at the moment. Will look into it and discuss with our group.
Cheers,
--Tim
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
2008-03-25 22:13 ` Josef 'Jeff' Sipek
2008-03-25 23:09 ` NeilBrown
@ 2008-03-26 3:37 ` David Chinner
2008-03-26 5:02 ` David Chinner
1 sibling, 1 reply; 9+ messages in thread
From: David Chinner @ 2008-03-26 3:37 UTC (permalink / raw)
To: Josef 'Jeff' Sipek
Cc: NeilBrown, J. Bruce Fields, xfs, Adam Schrotenboer, Jesper Juhl,
Trond Myklebust, linux-kernel, linux-nfs, Thomas Daniel,
Frederic Revenu, Jeff Doan
On Tue, Mar 25, 2008 at 06:13:21PM -0400, Josef 'Jeff' Sipek wrote:
> On Wed, Mar 26, 2008 at 08:38:22AM +1100, NeilBrown wrote:
> ...
> > However you still need to do something about the generation number. It
> > must be set to something.
.....
> > Even better would be store store that 'next generation number' in the
> > superblock so there would be even less risk of the 'random' generation
> > producing repeats.
> > This is what ext3 does. It doesn't dynamically allocate inodes,
> > but it doesn't want to pay the cost of reading an old inode from
> > storage just to see what the generation number is. So it has
> > a number in the superblock which is incremented on each inode allocation
> > and is used as the generation number.
>
> Something tells me that the SGI folks might not be all too happy with the
> in-sb number...
.....
> Perhaps a per-ag variable would be better,
/me goes back to the bug from last year about stable inode/gen numbers
for a HSM.
dgc> Right, except the last thing we want is yet more global state needing to
dgc> be updated in inode allocation. The best way to do this is a max generation
dgc> number per AG (held in the AGI) so that it can be updated at the same time
dgc> inodes are freed and not cause additional serialisation.
Which was soundly rejected by the HSM folk because it wraps at 4 billion
inode create/unlink cycles in an AG rather than per inode. The only thing
they were happy with was the old behaviour and so they now mount their
filesystems with ikeep. At that point the issue was dropped on the floor;
the NFS side of things apparently weren't causing any problems so we didn't
consider it urgent to fix....
Given this state of affairs (i.e. HSM using ikeep), I guess we can do
anything we want for the noikeep case. I'll cook up a patch that does
something similar to ext3 generation numbers for the initial seeding....
> but I remember reading that parallelizing updates
> to some inode count variable (I forget which) in the superblock
> \cite{dchinner-ols2006} led to a rather big improvement.
That was for in memory counters not on disk, and the problem really was
free block counts rather than free inode counts. Yes, I converted the
inode counters at the same time, but that wasn't the limiting factor.
Updates to the on disk superblock, OTOH, are a limiting factor and
that was the lazy superblock counter modifications solve....
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
2008-03-26 3:37 ` David Chinner
@ 2008-03-26 5:02 ` David Chinner
2008-04-17 19:37 ` Adam Schrotenboer
0 siblings, 1 reply; 9+ messages in thread
From: David Chinner @ 2008-03-26 5:02 UTC (permalink / raw)
To: David Chinner
Cc: Josef 'Jeff' Sipek, NeilBrown, J. Bruce Fields, xfs,
Adam Schrotenboer, Jesper Juhl, Trond Myklebust, linux-kernel,
linux-nfs, Thomas Daniel, Frederic Revenu, Jeff Doan
On Wed, Mar 26, 2008 at 02:37:38PM +1100, David Chinner wrote:
> Given this state of affairs (i.e. HSM using ikeep), I guess we can do
> anything we want for the noikeep case. I'll cook up a patch that does
> something similar to ext3 generation numbers for the initial seeding....
Patch below for comments. It passes xfsqa, but there's no userspace
support for it yet. 2.6.26 is the likely target for this change.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
---
Don't initialise new inode generation numbers to zero
When we allocation new inode chunks, we initialise the generation
numbers to zero. This works fine until we delete a chunk and then
reallocate it, resulting in the same inode numbers but with a
reset generation count. This can result in inode/generation
pairs of different inodes occurring relatively close together.
Given that the inode/gen pair makes up the "unique" portion of
an NFS filehandle on XFS, this can result in file handles cached
on clients being seen on the wire from the server but refer to
a different file. This causes .... issues for NFS clients.
Hence we need a unique generation number initialisation for
each inode to prevent reuse of a small portion of the generation
number space. Make this initialiser per-allocation group so
that it is not a single point of contention in the filesystem,
and increment it on every allocation within an AG to reduce the
chance that a generation number is reused for a given inode number
if the inode chunk is deleted and reallocated immediately
afterwards.
It is safe to add the agi_newinogen field to the AGI without
using a feature bit. If an older kernel is used, it simply
will not update the field on allocation. If the kernel is
updated and the field has garbage in it, then it's like having a
random seed to the generation number....
Signed-off-by: Dave Chinner <dgc@sgi.com>
---
fs/xfs/xfs_ag.h | 4 +++-
fs/xfs/xfs_ialloc.c | 30 ++++++++++++++++++++++--------
2 files changed, 25 insertions(+), 9 deletions(-)
Index: 2.6.x-xfs-new/fs/xfs/xfs_ag.h
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_ag.h 2008-01-18 18:30:06.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/xfs_ag.h 2008-03-26 13:03:41.122918236 +1100
@@ -121,6 +121,7 @@ typedef struct xfs_agi {
* still being referenced.
*/
__be32 agi_unlinked[XFS_AGI_UNLINKED_BUCKETS];
+ __be32 agi_newinogen; /* inode cluster generation */
} xfs_agi_t;
#define XFS_AGI_MAGICNUM 0x00000001
@@ -134,7 +135,8 @@ typedef struct xfs_agi {
#define XFS_AGI_NEWINO 0x00000100
#define XFS_AGI_DIRINO 0x00000200
#define XFS_AGI_UNLINKED 0x00000400
-#define XFS_AGI_NUM_BITS 11
+#define XFS_AGI_NEWINOGEN 0x00000800
+#define XFS_AGI_NUM_BITS 12
#define XFS_AGI_ALL_BITS ((1 << XFS_AGI_NUM_BITS) - 1)
/* disk block (xfs_daddr_t) in the AG */
Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2008-03-25 15:41:27.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-03-26 14:29:47.998554368 +1100
@@ -309,6 +309,8 @@ xfs_ialloc_ag_alloc(
free = XFS_MAKE_IPTR(args.mp, fbuf, i);
free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
free->di_core.di_version = version;
+ free->di_core.di_gen = agi->agi_newinogen;
+ be32_add_cpu(&agi->agi_newinogen, 1);
free->di_next_unlinked = cpu_to_be32(NULLAGINO);
xfs_ialloc_log_di(tp, fbuf, i,
XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED);
@@ -347,7 +349,8 @@ xfs_ialloc_ag_alloc(
* Log allocation group header fields
*/
xfs_ialloc_log_agi(tp, agbp,
- XFS_AGI_COUNT | XFS_AGI_FREECOUNT | XFS_AGI_NEWINO);
+ XFS_AGI_COUNT | XFS_AGI_FREECOUNT |
+ XFS_AGI_NEWINO | XFS_AGI_NEWINOGEN);
/*
* Modify/log superblock values for inode count and inode free count.
*/
@@ -896,11 +899,12 @@ nextag:
ino = XFS_AGINO_TO_INO(mp, agno, rec.ir_startino + offset);
XFS_INOBT_CLR_FREE(&rec, offset);
rec.ir_freecount--;
+ be32_add_cpu(&agi->agi_newinogen, 1);
if ((error = xfs_inobt_update(cur, rec.ir_startino, rec.ir_freecount,
rec.ir_free)))
goto error0;
be32_add(&agi->agi_freecount, -1);
- xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT);
+ xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT | XFS_AGI_NEWINOGEN);
down_read(&mp->m_peraglock);
mp->m_perag[tagno].pagi_freecount--;
up_read(&mp->m_peraglock);
@@ -1320,6 +1324,11 @@ xfs_ialloc_compute_maxlevels(
/*
* Log specified fields for the ag hdr (inode section)
+ *
+ * We don't log the unlinked inode fields through here; they
+ * get logged directly to the buffer. Hence we have a discontinuity
+ * in the fields we are logging and we need two calls to map all
+ * the dirtied parts of the agi....
*/
void
xfs_ialloc_log_agi(
@@ -1342,22 +1351,27 @@ xfs_ialloc_log_agi(
offsetof(xfs_agi_t, agi_newino),
offsetof(xfs_agi_t, agi_dirino),
offsetof(xfs_agi_t, agi_unlinked),
+ offsetof(xfs_agi_t, agi_newinogen),
sizeof(xfs_agi_t)
};
+ int log_newino = fields & XFS_AGI_NEWINOGEN;
+
#ifdef DEBUG
xfs_agi_t *agi; /* allocation group header */
agi = XFS_BUF_TO_AGI(bp);
ASSERT(be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC);
#endif
- /*
- * Compute byte offsets for the first and last fields.
- */
+ fields &= ~XFS_AGI_NEWINOGEN;
+
+ /* Compute byte offsets for the first and last fields. */
xfs_btree_offsets(fields, offsets, XFS_AGI_NUM_BITS, &first, &last);
- /*
- * Log the allocation group inode header buffer.
- */
xfs_trans_log_buf(tp, bp, first, last);
+ if (log_newino) {
+ xfs_btree_offsets(XFS_AGI_NEWINOGEN, offsets, XFS_AGI_NUM_BITS,
+ &first, &last);
+ xfs_trans_log_buf(tp, bp, first, last);
+ }
}
/*
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
2008-03-26 5:02 ` David Chinner
@ 2008-04-17 19:37 ` Adam Schrotenboer
0 siblings, 0 replies; 9+ messages in thread
From: Adam Schrotenboer @ 2008-04-17 19:37 UTC (permalink / raw)
To: David Chinner
Cc: Josef 'Jeff' Sipek, NeilBrown, J. Bruce Fields, xfs,
Jesper Juhl, Trond Myklebust, linux-kernel, linux-nfs,
Thomas Daniel, Frederic Revenu, Jeff Doan
[-- Attachment #1: Type: text/plain, Size: 566 bytes --]
David Chinner wrote:
> On Wed, Mar 26, 2008 at 02:37:38PM +1100, David Chinner wrote:
>
>> Given this state of affairs (i.e. HSM using ikeep), I guess we can do
>> anything we want for the noikeep case. I'll cook up a patch that does
>> something similar to ext3 generation numbers for the initial seeding....
>>
>
> Patch below for comments. It passes xfsqa, but there's no userspace
> support for it yet. 2.6.26 is the likely target for this change.
>
2.6.26 merge window begins now. Has this been pushed yet? Is it in
linux-next tree ?
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-04-17 19:37 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <47CF157B.1010908@m2000.com>
[not found] ` <18383.24847.381754.517731@notabene.brown>
[not found] ` <47CF62C5.7000908@m2000.com>
[not found] ` <18384.50909.866848.966192@notabene.brown>
[not found] ` <9a8748490803121513w285cd45rb6b26a3d842cac1b@mail.gmail.com>
[not found] ` <20080312221511.GC31632@fieldses.org>
[not found] ` <9a8748490803121516u36395872i70cc88b0439adc74@mail.gmail.com>
[not found] ` <18394.1501.991087.80264@notabene.brown>
[not found] ` <47DAEFD0.9020407@m2000.com>
[not found] ` <47E92F8E.7030504@m2000.com>
[not found] ` <20080325190943.GF2237@fieldses.org>
2008-03-25 20:32 ` [opensuse] nfs_update_inode: inode X mode changed, Y to Z NeilBrown
2008-03-25 21:24 ` Josef 'Jeff' Sipek
2008-03-25 21:38 ` NeilBrown
2008-03-25 22:13 ` Josef 'Jeff' Sipek
2008-03-25 23:09 ` NeilBrown
2008-03-26 3:37 ` David Chinner
2008-03-26 5:02 ` David Chinner
2008-04-17 19:37 ` Adam Schrotenboer
2008-03-26 3:27 ` Timothy Shimmin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox