From: Niv Sardi <xaiki@cxhome.ath.cx>
To: David Chinner <dgc@sgi.com>
Cc: xfs-dev <xfs-dev@sgi.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [Patch] unique per-AG inode generation number initialisation
Date: Wed, 02 Apr 2008 15:02:42 +1100 [thread overview]
Message-ID: <nccbq4slx99.fsf@sgi.com> (raw)
In-Reply-To: <20080401231815.GW103491721@sgi.com> (David Chinner's message of "Wed, 2 Apr 2008 09:18:15 +1000")
<HAT type=DMAPI>
David Chinner <dgc@sgi.com> writes:
> Don't initialise new inode generation numbers to zero
>
> When we allocation new inode chunks, we initialise the generation
> numbers to zero. This works fine until we delete a chunk and then
> reallocate it, resulting in the same inode numbers but with a
> reset generation count. This can result in inode/generation
> pairs of different inodes occurring relatively close together.
>
> Given that the inode/gen pair makes up the "unique" portion of
> an NFS filehandle on XFS, this can result in file handles cached
> on clients being seen on the wire from the server but refer to
> a different file. This causes .... issues for NFS clients.
>
> Hence we need a unique generation number initialisation for
> each inode to prevent reuse of a small portion of the generation
> number space. Make this initialiser per-allocation group so
> that it is not a single point of contention in the filesystem,
> and increment it on every allocation within an AG to reduce the
> chance that a generation number is reused for a given inode number
> if the inode chunk is deleted and reallocated immediately
> afterwards.
>
> It is safe to add the agi_newinogen field to the AGI without
> using a feature bit. If an older kernel is used, it simply
> will not update the field on allocation. If the kernel is
> updated and the field has garbage in it, then it's like having a
> random seed to the generation number....
>
> Signed-off-by: Dave Chinner <dgc@sgi.com>
> ---
> fs/xfs/xfs_ag.h | 4 +++-
> fs/xfs/xfs_ialloc.c | 30 ++++++++++++++++++++++--------
> 2 files changed, 25 insertions(+), 9 deletions(-)
Appart from the bit of overhead all seems good.
> Index: 2.6.x-xfs-new/fs/xfs/xfs_ag.h
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ag.h 2008-01-18 18:30:06.000000000 +1100
> +++ 2.6.x-xfs-new/fs/xfs/xfs_ag.h 2008-03-26 13:03:41.122918236 +1100
> @@ -121,6 +121,7 @@ typedef struct xfs_agi {
> * still being referenced.
> */
> __be32 agi_unlinked[XFS_AGI_UNLINKED_BUCKETS];
> + __be32 agi_newinogen; /* inode cluster generation */
> } xfs_agi_t;
>
> #define XFS_AGI_MAGICNUM 0x00000001
> @@ -134,7 +135,8 @@ typedef struct xfs_agi {
> #define XFS_AGI_NEWINO 0x00000100
> #define XFS_AGI_DIRINO 0x00000200
> #define XFS_AGI_UNLINKED 0x00000400
> -#define XFS_AGI_NUM_BITS 11
> +#define XFS_AGI_NEWINOGEN 0x00000800
> +#define XFS_AGI_NUM_BITS 12
> #define XFS_AGI_ALL_BITS ((1 << XFS_AGI_NUM_BITS) - 1)
>
> /* disk block (xfs_daddr_t) in the AG */
> Index: 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/xfs/xfs_ialloc.c 2008-03-25 15:41:27.000000000 +1100
> +++ 2.6.x-xfs-new/fs/xfs/xfs_ialloc.c 2008-03-26 14:29:47.998554368 +1100
> @@ -309,6 +309,8 @@ xfs_ialloc_ag_alloc(
> free = XFS_MAKE_IPTR(args.mp, fbuf, i);
> free->di_core.di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
> free->di_core.di_version = version;
> + free->di_core.di_gen = agi->agi_newinogen;
> + be32_add_cpu(&agi->agi_newinogen, 1);
> free->di_next_unlinked = cpu_to_be32(NULLAGINO);
> xfs_ialloc_log_di(tp, fbuf, i,
> XFS_DI_CORE_BITS | XFS_DI_NEXT_UNLINKED);
> @@ -347,7 +349,8 @@ xfs_ialloc_ag_alloc(
> * Log allocation group header fields
> */
> xfs_ialloc_log_agi(tp, agbp,
> - XFS_AGI_COUNT | XFS_AGI_FREECOUNT | XFS_AGI_NEWINO);
> + XFS_AGI_COUNT | XFS_AGI_FREECOUNT |
> + XFS_AGI_NEWINO | XFS_AGI_NEWINOGEN);
> /*
> * Modify/log superblock values for inode count and inode free count.
> */
> @@ -896,11 +899,12 @@ nextag:
> ino = XFS_AGINO_TO_INO(mp, agno, rec.ir_startino + offset);
> XFS_INOBT_CLR_FREE(&rec, offset);
> rec.ir_freecount--;
> + be32_add_cpu(&agi->agi_newinogen, 1);
> if ((error = xfs_inobt_update(cur, rec.ir_startino, rec.ir_freecount,
> rec.ir_free)))
> goto error0;
> be32_add(&agi->agi_freecount, -1);
> - xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT);
> + xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT | XFS_AGI_NEWINOGEN);
> down_read(&mp->m_peraglock);
> mp->m_perag[tagno].pagi_freecount--;
> up_read(&mp->m_peraglock);
> @@ -1320,6 +1324,11 @@ xfs_ialloc_compute_maxlevels(
>
> /*
> * Log specified fields for the ag hdr (inode section)
> + *
> + * We don't log the unlinked inode fields through here; they
> + * get logged directly to the buffer. Hence we have a discontinuity
> + * in the fields we are logging and we need two calls to map all
> + * the dirtied parts of the agi....
> */
> void
> xfs_ialloc_log_agi(
> @@ -1342,22 +1351,27 @@ xfs_ialloc_log_agi(
> offsetof(xfs_agi_t, agi_newino),
> offsetof(xfs_agi_t, agi_dirino),
> offsetof(xfs_agi_t, agi_unlinked),
> + offsetof(xfs_agi_t, agi_newinogen),
> sizeof(xfs_agi_t)
> };
> + int log_newino = fields & XFS_AGI_NEWINOGEN;
> +
> #ifdef DEBUG
> xfs_agi_t *agi; /* allocation group header */
>
> agi = XFS_BUF_TO_AGI(bp);
> ASSERT(be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC);
> #endif
> - /*
> - * Compute byte offsets for the first and last fields.
> - */
> + fields &= ~XFS_AGI_NEWINOGEN;
> +
> + /* Compute byte offsets for the first and last fields. */
> xfs_btree_offsets(fields, offsets, XFS_AGI_NUM_BITS, &first, &last);
> - /*
> - * Log the allocation group inode header buffer.
> - */
> xfs_trans_log_buf(tp, bp, first, last);
> + if (log_newino) {
> + xfs_btree_offsets(XFS_AGI_NEWINOGEN, offsets, XFS_AGI_NUM_BITS,
> + &first, &last);
> + xfs_trans_log_buf(tp, bp, first, last);
> + }
> }
>
> /*
--
Niv Sardi
next prev parent reply other threads:[~2008-04-02 4:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-01 23:18 [Patch] unique per-AG inode generation number initialisation David Chinner
2008-04-02 4:02 ` Niv Sardi [this message]
2008-04-04 9:08 ` Hans-Peter Jansen
2008-04-07 12:57 ` Christoph Hellwig
2008-04-07 21:52 ` David Chinner
2008-04-10 4:34 ` David Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nccbq4slx99.fsf@sgi.com \
--to=xaiki@cxhome.ath.cx \
--cc=dgc@sgi.com \
--cc=xfs-dev@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox