* TAKE 969192: Default mount option "noikeep" makes the inode generation number non-persistent
@ 2007-08-24 4:01 Vlad Apostolov
2007-08-24 11:36 ` Christoph Hellwig
0 siblings, 1 reply; 4+ messages in thread
From: Vlad Apostolov @ 2007-08-24 4:01 UTC (permalink / raw)
To: sgi.bugs.xfs; +Cc: linux-xfs
XFS inodes are dynamically allocated on demand, rather than being allocated
at mkfs time. Chunks of 64 inodes are allocated at once, but they are
never freed. Over time, this can lead to filesystem fragmentation, clusters
of inodes and the btrees which point at them can be scattered around the
system.
By freeing clusters as they are emptied, we reduce fragmentation of the
free space after removing files. This in turn allows us to make better
placement decisions when repopulating a filesystem. The XFSMNT_IDELETE
mount option enables freeing clusters when they get empty.
Unfortunately a side effect of freeing inode clusters is that the inode
generation numbers of such inodes would be reset to zero when the cluster
is reclaimed. This is a problem in particular for a DMAPI enabled filesystem
as the the DMAPI handles need to be unique and persistent in time. An unique
DMAPI handle is built with the help of the inode generation number. When the
last one is prematurely reset by an inode cluster reclaim, there is
a high probability of different generation inodes to end up having identical
DMAPI handles.
To avoid the problem with identical DMAPI handles, the XFSMNT_IDELETE mount
option should be set as default, only if the filesystem is not mounted with
XFSMNT_DMAPI.
Date: Fri Aug 24 13:54:57 AEST 2007
Workarea: soarer.melbourne.sgi.com:/home/vapo/isms/linux-xfs
Inspected by: dgc, markgw
Author: vapo
The following file(s) were checked into:
longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb
Modid: xfs-linux-melb:xfs-kern:29486a
fs/xfs/xfs_vfsops.c - 1.527 - changed
http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/xfs_vfsops.c.diff?r1=text&tr1=1.527&r2=text&tr2=1.526&f=h
- pv 969192, rv dgc, markgw - Imply "ikeep" default mount option
for DMAPI enabled filestems.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: TAKE 969192: Default mount option "noikeep" makes the inode generation number non-persistent
2007-08-24 4:01 TAKE 969192: Default mount option "noikeep" makes the inode generation number non-persistent Vlad Apostolov
@ 2007-08-24 11:36 ` Christoph Hellwig
2007-08-24 12:49 ` David Chinner
0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2007-08-24 11:36 UTC (permalink / raw)
To: Vlad Apostolov; +Cc: linux-xfs
On Fri, Aug 24, 2007 at 02:01:30PM +1000, Vlad Apostolov wrote:
> To avoid the problem with identical DMAPI handles, the XFSMNT_IDELETE mount
> option should be set as default, only if the filesystem is not mounted with
> XFSMNT_DMAPI.
Note that we have the same problem with nfs exports aswell. Dateo maybe we
need a real fix insteead and keep a block of generation numbers around even
if and inode cluster is freed or something similar.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: TAKE 969192: Default mount option "noikeep" makes the inode generation number non-persistent
2007-08-24 11:36 ` Christoph Hellwig
@ 2007-08-24 12:49 ` David Chinner
2007-08-27 6:22 ` Mark Goodwin
0 siblings, 1 reply; 4+ messages in thread
From: David Chinner @ 2007-08-24 12:49 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Vlad Apostolov, linux-xfs
On Fri, Aug 24, 2007 at 12:36:31PM +0100, Christoph Hellwig wrote:
> On Fri, Aug 24, 2007 at 02:01:30PM +1000, Vlad Apostolov wrote:
> > To avoid the problem with identical DMAPI handles, the XFSMNT_IDELETE mount
> > option should be set as default, only if the filesystem is not mounted with
> > XFSMNT_DMAPI.
>
> Note that we have the same problem with nfs exports aswell. Dateo maybe we
> need a real fix insteead and keep a block of generation numbers around even
> if and inode cluster is freed or something similar.
Yes. NFS is less critical than dmapi, though - with NFS filehandles just a
change in generation number is usually good enough to catch most stale
filehandle issues. With DMAPI, there's applications that record inode
number/generation pairs and expect them never to repeat ever again.
We haven't had any reports of probelms with NFS servers due to this,
but as soon as our HSm was exposed to this code we started getting
strange coherency and corruption problems that have taken some time
to track down to this issue. Hence this change seems like the
best tradeoff while we work out a real solution.
At this point I suspect a deleted inode cluster btree in the AGI
is the best solution because it can share most of the btree
code with the current AGI btree and keeps the granularity of
shared generation numbers quite fine.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: TAKE 969192: Default mount option "noikeep" makes the inode generation number non-persistent
2007-08-24 12:49 ` David Chinner
@ 2007-08-27 6:22 ` Mark Goodwin
0 siblings, 0 replies; 4+ messages in thread
From: Mark Goodwin @ 2007-08-27 6:22 UTC (permalink / raw)
To: David Chinner; +Cc: Christoph Hellwig, Vlad Apostolov, linux-xfs
David Chinner wrote:
> On Fri, Aug 24, 2007 at 12:36:31PM +0100, Christoph Hellwig wrote:
>> On Fri, Aug 24, 2007 at 02:01:30PM +1000, Vlad Apostolov wrote:
>>> To avoid the problem with identical DMAPI handles, the XFSMNT_IDELETE mount
>>> option should be set as default, only if the filesystem is not mounted with
>>> XFSMNT_DMAPI.
>> Note that we have the same problem with nfs exports aswell. Dateo maybe we
>> need a real fix insteead and keep a block of generation numbers around even
>> if and inode cluster is freed or something similar.
>
> Yes. NFS is less critical than dmapi, though - with NFS filehandles just a
> change in generation number is usually good enough to catch most stale
> filehandle issues. With DMAPI, there's applications that record inode
> number/generation pairs and expect them never to repeat ever again.
>
> We haven't had any reports of probelms with NFS servers due to this,
> but as soon as our HSm was exposed to this code we started getting
> strange coherency and corruption problems that have taken some time
> to track down to this issue. Hence this change seems like the
> best tradeoff while we work out a real solution.
>
> At this point I suspect a deleted inode cluster btree in the AGI
> is the best solution because it can share most of the btree
> code with the current AGI btree and keeps the granularity of
> shared generation numbers quite fine.
Having a persistent highest/shared generation number per inode cluster
only solves part of the problem - with only 32 bits of precision, eventually
it will wrap. Generation numbers need more precision to solve this
completely. With more precision, the starting value could simply be
based on a timestamp ...
--
Mark Goodwin markgw@sgi.com
Engineering Manager for XFS and PCP Phone: +61-3-99631937
SGI Australian Software Group Cell: +61-4-18969583
-------------------------------------------------------------
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-08-27 6:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-24 4:01 TAKE 969192: Default mount option "noikeep" makes the inode generation number non-persistent Vlad Apostolov
2007-08-24 11:36 ` Christoph Hellwig
2007-08-24 12:49 ` David Chinner
2007-08-27 6:22 ` Mark Goodwin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox