From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 11/60] xfs: Inode create item recovery
Date: Wed, 19 Jun 2013 14:50:19 +1000 [thread overview]
Message-ID: <1371617468-32559-12-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1371617468-32559-1-git-send-email-david@fromorbit.com>
When we find a icreate transaction, we need to get and initialise
the buffers in the range that has been passed. Extract and verify
the information in the item record, then loop over the range
initialising and issuing the buffer writes delayed.
Support an arbitrary size range to initialise so that in
future when we allocate inodes in much larger chunks all kernels
that understand this transaction can still recover them.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/xfs_ialloc.c | 37 +++++++++++----
fs/xfs/xfs_ialloc.h | 8 ++++
fs/xfs/xfs_log_recover.c | 114 ++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 145 insertions(+), 14 deletions(-)
diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index c8f5ae1..d5ef81a 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -150,12 +150,16 @@ xfs_check_agi_freecount(
#endif
/*
- * Initialise a new set of inodes.
+ * Initialise a new set of inodes. When called without a transaction context
+ * (e.g. from recovery) we initiate a delayed write of the inode buffers rather
+ * than logging them (which in a transaction context puts them into the AIL
+ * for writeback rather than the xfsbufd queue).
*/
STATIC int
xfs_ialloc_inode_init(
struct xfs_mount *mp,
struct xfs_trans *tp,
+ struct list_head *buffer_list,
xfs_agnumber_t agno,
xfs_agblock_t agbno,
xfs_agblock_t length,
@@ -247,18 +251,33 @@ xfs_ialloc_inode_init(
ino++;
uuid_copy(&free->di_uuid, &mp->m_sb.sb_uuid);
xfs_dinode_calc_crc(mp, free);
- } else {
+ } else if (tp) {
/* just log the inode core */
xfs_trans_log_buf(tp, fbuf, ioffset,
ioffset + isize - 1);
}
}
- if (version == 3) {
- /* need to log the entire buffer */
- xfs_trans_log_buf(tp, fbuf, 0,
- BBTOB(fbuf->b_length) - 1);
+
+ if (tp) {
+ /*
+ * Mark the buffer as an inode allocation buffer so it
+ * sticks in AIL at the point of this allocation
+ * transaction. This ensures the they are on disk before
+ * the tail of the log can be moved past this
+ * transaction (i.e. by preventing relogging from moving
+ * it forward in the log).
+ */
+ xfs_trans_inode_alloc_buf(tp, fbuf);
+ if (version == 3) {
+ /* need to log the entire buffer */
+ xfs_trans_log_buf(tp, fbuf, 0,
+ BBTOB(fbuf->b_length) - 1);
+ }
+ } else {
+ fbuf->b_flags |= XBF_DONE;
+ xfs_buf_delwri_queue(fbuf, buffer_list);
+ xfs_buf_relse(fbuf);
}
- xfs_trans_inode_alloc_buf(tp, fbuf);
}
return 0;
}
@@ -303,7 +322,7 @@ xfs_ialloc_ag_alloc(
* First try to allocate inodes contiguous with the last-allocated
* chunk of inodes. If the filesystem is striped, this will fill
* an entire stripe unit with inodes.
- */
+ */
agi = XFS_BUF_TO_AGI(agbp);
newino = be32_to_cpu(agi->agi_newino);
agno = be32_to_cpu(agi->agi_seqno);
@@ -402,7 +421,7 @@ xfs_ialloc_ag_alloc(
* rather than a linear progression to prevent the next generation
* number from being easily guessable.
*/
- error = xfs_ialloc_inode_init(args.mp, tp, agno, args.agbno,
+ error = xfs_ialloc_inode_init(args.mp, tp, NULL, agno, args.agbno,
args.len, prandom_u32());
if (error)
diff --git a/fs/xfs/xfs_ialloc.h b/fs/xfs/xfs_ialloc.h
index c8da3df..68c0732 100644
--- a/fs/xfs/xfs_ialloc.h
+++ b/fs/xfs/xfs_ialloc.h
@@ -150,6 +150,14 @@ int xfs_inobt_lookup(struct xfs_btree_cur *cur, xfs_agino_t ino,
int xfs_inobt_get_rec(struct xfs_btree_cur *cur,
xfs_inobt_rec_incore_t *rec, int *stat);
+/*
+ * Inode chunk initialisation routine
+ */
+int xfs_ialloc_inode_init(struct xfs_mount *mp, struct xfs_trans *tp,
+ struct list_head *buffer_list,
+ xfs_agnumber_t agno, xfs_agblock_t agbno,
+ xfs_agblock_t length, unsigned int gen);
+
extern const struct xfs_buf_ops xfs_agi_buf_ops;
#endif /* __XFS_IALLOC_H__ */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 7cf5e4e..6fcc910a 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -45,6 +45,7 @@
#include "xfs_cksum.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
+#include "xfs_icreate_item.h"
/* Need all the magic numbers and buffer ops structures from these headers */
#include "xfs_symlink.h"
@@ -1617,7 +1618,10 @@ xlog_recover_add_to_trans(
* form the cancelled buffer table. Hence they have tobe done last.
*
* 3. Inode allocation buffers must be replayed before inode items that
- * read the buffer and replay changes into it.
+ * read the buffer and replay changes into it. For filesystems using the
+ * ICREATE transactions, this means XFS_LI_ICREATE objects need to get
+ * treated the same as inode allocation buffers as they create and
+ * initialise the buffers directly.
*
* 4. Inode unlink buffers must be replayed after inode items are replayed.
* This ensures that inodes are completely flushed to the inode buffer
@@ -1632,10 +1636,17 @@ xlog_recover_add_to_trans(
* from all the other buffers and move them to last.
*
* Hence, 4 lists, in order from head to tail:
- * - buffer_list for all buffers except cancelled/inode unlink buffers
- * - item_list for all non-buffer items
- * - inode_buffer_list for inode unlink buffers
- * - cancel_list for the cancelled buffers
+ * - buffer_list for all buffers except cancelled/inode unlink buffers
+ * - item_list for all non-buffer items
+ * - inode_buffer_list for inode unlink buffers
+ * - cancel_list for the cancelled buffers
+ *
+ * Note that we add objects to the tail of the lists so that first-to-last
+ * ordering is preserved within the lists. Adding objects to the head of the
+ * list means when we traverse from the head we walk them in last-to-first
+ * order. For cancelled buffers and inode unlink buffers this doesn't matter,
+ * but for all other items there may be specific ordering that we need to
+ * preserve.
*/
STATIC int
xlog_recover_reorder_trans(
@@ -1655,6 +1666,9 @@ xlog_recover_reorder_trans(
xfs_buf_log_format_t *buf_f = item->ri_buf[0].i_addr;
switch (ITEM_TYPE(item)) {
+ case XFS_LI_ICREATE:
+ list_move_tail(&item->ri_list, &buffer_list);
+ break;
case XFS_LI_BUF:
if (buf_f->blf_flags & XFS_BLF_CANCEL) {
trace_xfs_log_recover_item_reorder_head(log,
@@ -2982,6 +2996,93 @@ xlog_recover_efd_pass2(
}
/*
+ * This routine is called when an inode create format structure is found in a
+ * committed transaction in the log. It's purpose is to initialise the inodes
+ * being allocated on disk. This requires us to get inode cluster buffers that
+ * match the range to be intialised, stamped with inode templates and written
+ * by delayed write so that subsequent modifications will hit the cached buffer
+ * and only need writing out at the end of recovery.
+ */
+STATIC int
+xlog_recover_do_icreate_pass2(
+ struct xlog *log,
+ struct list_head *buffer_list,
+ xlog_recover_item_t *item)
+{
+ struct xfs_mount *mp = log->l_mp;
+ struct xfs_icreate_log *icl;
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ unsigned int count;
+ unsigned int isize;
+ xfs_agblock_t length;
+
+ icl = (struct xfs_icreate_log *)item->ri_buf[0].i_addr;
+ if (icl->icl_type != XFS_LI_ICREATE) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad type");
+ return EINVAL;
+ }
+
+ if (icl->icl_size != 1) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad icl size");
+ return EINVAL;
+ }
+
+ agno = be32_to_cpu(icl->icl_ag);
+ if (agno >= mp->m_sb.sb_agcount) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad agno");
+ return EINVAL;
+ }
+ agbno = be32_to_cpu(icl->icl_agbno);
+ if (!agbno || agbno == NULLAGBLOCK || agbno >= mp->m_sb.sb_agblocks) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad agbno");
+ return EINVAL;
+ }
+ isize = be32_to_cpu(icl->icl_isize);
+ if (isize != mp->m_sb.sb_inodesize) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad isize");
+ return EINVAL;
+ }
+ count = be32_to_cpu(icl->icl_count);
+ if (!count) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad count");
+ return EINVAL;
+ }
+ length = be32_to_cpu(icl->icl_length);
+ if (!length || length >= mp->m_sb.sb_agblocks) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad length");
+ return EINVAL;
+ }
+
+ /* existing allocation is fixed value */
+ ASSERT(count == XFS_IALLOC_INODES(mp));
+ ASSERT(length == XFS_IALLOC_BLOCKS(mp));
+ if (count != XFS_IALLOC_INODES(mp) ||
+ length != XFS_IALLOC_BLOCKS(mp)) {
+ xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad count 2");
+ return EINVAL;
+ }
+
+ /*
+ * Inode buffers can be freed. Do not replay the inode initialisation as
+ * we could be overwriting something written after this inode buffer was
+ * cancelled.
+ *
+ * XXX: we need to iterate all buffers and only init those that are not
+ * cancelled. I think that a more fine grained factoring of
+ * xfs_ialloc_inode_init may be appropriate here to enable this to be
+ * done easily.
+ */
+ if (xlog_check_buffer_cancelled(log,
+ XFS_AGB_TO_DADDR(mp, agno, agbno), length, 0))
+ return 0;
+
+ xfs_ialloc_inode_init(mp, NULL, buffer_list, agno, agbno, length,
+ be32_to_cpu(icl->icl_gen));
+ return 0;
+}
+
+/*
* Free up any resources allocated by the transaction
*
* Remember that EFIs, EFDs, and IUNLINKs are handled later.
@@ -3023,6 +3124,7 @@ xlog_recover_commit_pass1(
case XFS_LI_EFI:
case XFS_LI_EFD:
case XFS_LI_DQUOT:
+ case XFS_LI_ICREATE:
/* nothing to do in pass 1 */
return 0;
default:
@@ -3053,6 +3155,8 @@ xlog_recover_commit_pass2(
return xlog_recover_efd_pass2(log, item);
case XFS_LI_DQUOT:
return xlog_recover_dquot_pass2(log, buffer_list, item);
+ case XFS_LI_ICREATE:
+ return xlog_recover_do_icreate_pass2(log, buffer_list, item);
case XFS_LI_QUOTAOFF:
/* nothing to do in pass2 */
return 0;
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-06-19 4:51 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-19 4:50 [PATCH 00/60] xfs: patch queue for 3.11 Dave Chinner
2013-06-19 4:50 ` [PATCH 01/60] xfs: update mount options documentation Dave Chinner
2013-06-20 15:35 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 02/60] xfs: add pluging for bulkstat readahead Dave Chinner
2013-06-20 16:59 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 03/60] xfs: plug directory buffer readahead Dave Chinner
2013-06-20 18:45 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 04/60] xfs: don't use speculative prealloc for small files Dave Chinner
2013-06-19 12:59 ` Brian Foster
2013-06-20 19:31 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 05/60] xfs: don't do IO when creating an new inode Dave Chinner
2013-06-21 13:57 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 06/60] xfs: xfs_ifree doesn't need to modify the inode buffer Dave Chinner
2013-06-21 21:24 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 07/60] xfs: Introduce ordered log vector support Dave Chinner
2013-06-22 17:26 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 08/60] xfs: Introduce an ordered buffer item Dave Chinner
2013-06-23 17:27 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 09/60] xfs: Inode create log items Dave Chinner
2013-06-22 15:49 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 10/60] xfs: Inode create transaction reservations Dave Chinner
2013-06-23 17:29 ` Mark Tinguely
2013-06-19 4:50 ` Dave Chinner [this message]
2013-06-24 14:37 ` [PATCH 11/60] xfs: Inode create item recovery Mark Tinguely
2013-06-19 4:50 ` [PATCH 12/60] xfs: Use inode create transaction Dave Chinner
2013-06-24 18:55 ` Mark Tinguely
2013-06-19 4:50 ` [PATCH 13/60] xfs: remove local fork format handling from xfs_bmapi_write() Dave Chinner
2013-06-19 4:50 ` [PATCH 14/60] xfs: move getdents code into it's own file Dave Chinner
2013-06-19 4:50 ` [PATCH 15/60] xfs: reshuffle dir2 definitions around for userspace Dave Chinner
2013-06-19 4:50 ` [PATCH 16/60] xfs: split out attribute listing code into separate file Dave Chinner
2013-06-19 4:50 ` [PATCH 17/60] xfs: split out attribute fork truncation " Dave Chinner
2013-06-19 4:50 ` [PATCH 18/60] xfs: split out xfs inode operations " Dave Chinner
2013-06-19 4:50 ` [PATCH 19/60] xfs: consolidate xfs_vnodeops.c into xfs_inode_ops.c Dave Chinner
2013-06-19 4:50 ` [PATCH 20/60] xfs: move xfs_getbmap to xfs_extent_ops.c Dave Chinner
2013-06-19 4:50 ` [PATCH 21/60] xfs: introduce xfs_sb.c for sharing with libxfs Dave Chinner
2013-06-19 4:50 ` [PATCH 22/60] xfs: move xfs_trans_reservations to xfs_trans.h Dave Chinner
2013-06-19 4:50 ` [PATCH 23/60] xfs: sync minor header differences needed by userspace Dave Chinner
2013-06-19 4:50 ` [PATCH 24/60] xfs: move xfs_bmap_punch_delalloc() to xfs_aops.c Dave Chinner
2013-06-19 4:50 ` [PATCH 25/60] xfs: split out transaction reservation code Dave Chinner
2013-06-19 4:50 ` [PATCH 26/60] xfs: minor cleanups Dave Chinner
2013-06-19 4:50 ` [PATCH 27/60] xfs: fix issues that cause userspace warnings Dave Chinner
2013-06-19 4:50 ` [PATCH 28/60] xfs: consolidate xfs_rename.c Dave Chinner
2013-06-19 4:50 ` [PATCH 29/60] xfs: consolidate xfs_utils.c Dave Chinner
2013-06-19 9:40 ` Christoph Hellwig
2013-06-19 4:50 ` [PATCH 30/60] xfs: split out inode log item format definition Dave Chinner
2013-06-19 4:50 ` [PATCH 31/60] xfs: split out buf log item format definitions Dave Chinner
2013-06-19 4:50 ` [PATCH 32/60] xfs: move inode fork definitions to a new header file Dave Chinner
2013-06-19 4:50 ` [PATCH 33/60] xfs: move unrealted definitions out of xfs_inode.h Dave Chinner
2013-06-19 4:50 ` [PATCH 34/60] xfs: introduce xfs_inode_buf.c for inode buffer operations Dave Chinner
2013-06-19 4:50 ` [PATCH 35/60] xfs: start repopulating xfs_inode.[ch] with kernel code Dave Chinner
2013-06-19 4:50 ` [PATCH 36/60] xfs: move swap extent code to xfs_extent_ops Dave Chinner
2013-06-19 4:50 ` [PATCH 37/60] xfs: split out inode log item format definition Dave Chinner
2013-06-19 4:50 ` [PATCH 38/60] xfs: separate dquot on disk format definitions out of xfs_quota.h Dave Chinner
2013-06-19 4:50 ` [PATCH 39/60] xfs: separate icreate log format definitions from xfs_icreate_item.h Dave Chinner
2013-06-19 4:50 ` [PATCH 40/60] xfs: don't special case shared superblock mounts Dave Chinner
2013-06-19 4:50 ` [PATCH 41/60] xfs: kill __KERNEL__ check for debug code in allocation code Dave Chinner
2013-06-19 4:50 ` [PATCH 42/60] xfs: split out on-disk transaction definitions Dave Chinner
2013-06-19 4:50 ` [PATCH 43/60] xfs: remove __KERNEL__ from debug code Dave Chinner
2013-06-19 4:50 ` [PATCH 44/60] xfs: remove __KERNEL__ check from xfs_dir2_leaf.c Dave Chinner
2013-06-19 4:50 ` [PATCH 45/60] xfs: xfs_filestreams.h doesn't need __KERNEL__ Dave Chinner
2013-06-19 4:50 ` [PATCH 46/60] xfs: split out the remote symlink handling Dave Chinner
2013-06-19 4:50 ` [PATCH 47/60] xfs: separate out log format definitions Dave Chinner
2013-06-19 4:50 ` [PATCH 48/60] xfs: move kernel specific type definitions to xfs.h Dave Chinner
2013-06-19 4:50 ` [PATCH 49/60] xfs: make struct xfs_perag kernel only Dave Chinner
2013-06-19 4:50 ` [PATCH 50/60] xfs: create xfs_bmap_util.[ch] Dave Chinner
2013-06-19 4:50 ` [PATCH 51/60] xfs: introduce xfs_quota_defs.h Dave Chinner
2013-06-19 4:51 ` [PATCH 52/60] xfs: introduce xfs_rtalloc_defs.h Dave Chinner
2013-06-19 4:51 ` [PATCH 53/60] xfs: Introduce a new structure to hold transaction reservation items Dave Chinner
2013-06-19 4:51 ` [PATCH 54/60] xfs: Introduce tr_fsyncts to m_reservation Dave Chinner
2013-06-19 4:51 ` [PATCH 55/60] xfs: Make writeid transaction use tr_writeid Dave Chinner
2013-06-19 4:51 ` [PATCH 56/60] xfs: refactor xfs_trans_reserve() interface Dave Chinner
2013-06-19 4:51 ` [PATCH 57/60] xfs: Get rid of all XFS_XXX_LOG_RES() macro Dave Chinner
2013-06-19 4:51 ` [PATCH 58/60] xfs: Refactor xfs_ticket_alloc() to extract a new helper Dave Chinner
2013-06-19 4:51 ` [PATCH 59/60] xfs: Add xfs_log_rlimit.c Dave Chinner
2013-06-20 17:24 ` Michael L. Semon
2013-06-21 6:10 ` Michael L. Semon
2013-06-24 21:26 ` Mark Tinguely
2013-06-24 22:27 ` Dave Chinner
2013-06-25 14:06 ` Mark Tinguely
2013-06-26 4:05 ` Dave Chinner
2013-06-26 13:48 ` Mark Tinguely
2013-06-26 22:18 ` Dave Chinner
2013-06-19 4:51 ` [PATCH 60/60] xfs: Validate log space at mount time Dave Chinner
2013-06-19 9:15 ` [PATCH 00/60] xfs: patch queue for 3.11 Christoph Hellwig
2013-06-19 21:34 ` Dave Chinner
2013-06-20 9:17 ` Christoph Hellwig
2013-06-19 14:35 ` Ben Myers
2013-06-19 14:44 ` Christoph Hellwig
2013-06-19 14:54 ` Ric Wheeler
2013-06-19 15:47 ` Ben Myers
2013-06-19 23:33 ` Dave Chinner
2013-06-20 19:14 ` Ben Myers
2013-06-20 19:31 ` Chandra Seetharaman
2013-06-19 22:54 ` Dave Chinner
2013-06-20 4:51 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1371617468-32559-12-git-send-email-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox